Adaptation-relevant predictions of climate change are often derived by combining climate model simulations in a multi-model ensemble. Model evaluation methods used in performance-based ensemble weighting schemes have limitations in the context of high-impact extreme events. We introduce a locally time-invariant method for evaluating climate model simulations with a focus on assessing the simulation of extremes. We explore the behavior of the proposed method in predicting extreme heat days in Nairobi and provide comparative results for eight additional cities.