Results 1  10
of
60
Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
"... Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he ..."
Abstract

Cited by 373 (28 self)
 Add to MetaCart
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to crossvalidation, and propose a novel form of crossvalidation known as randomfold crossvalidation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
Using Bayesian model averaging to calibrate forecast ensembles
 MONTHLY WEATHER REVIEW 133
, 2005
"... Ensembles used for probabilistic weather forecasting often exhibit a spreaderror correlation, but they tend to be underdispersive. This paper proposes a statistical method for postprocessing ensembles based on Bayesian model averaging (BMA), which is a standard method for combining predictive distr ..."
Abstract

Cited by 144 (34 self)
 Add to MetaCart
Ensembles used for probabilistic weather forecasting often exhibit a spreaderror correlation, but they tend to be underdispersive. This paper proposes a statistical method for postprocessing ensembles based on Bayesian model averaging (BMA), which is a standard method for combining predictive distributions from different sources. The BMA predictive probability density function (PDF) of any quantity of interest is a weighted average of PDFs centered on the individual biascorrected forecasts, where the weights are equal to posterior probabilities of the models generating the forecasts and reflect the models ’ relative contributions to predictive skill over the training period. The BMA weights can be used to assess the usefulness of ensemble members, and this can be used as a basis for selecting ensemble members; this can be useful given the cost of running large ensembles. The BMA PDF can be represented as an unweighted ensemble of any desired size, by simulating from the BMA predictive distribution. The BMA predictive variance can be decomposed into two components, one corresponding to the betweenforecast variability, and the second to the withinforecast variability. Predictive PDFs or intervals based solely on the ensemble spread incorporate the first component but not the second. Thus BMA provides a theoretical explanation of the tendency of ensembles to exhibit a spreaderror correlation but yet
Probabilistic forecasts, calibration and sharpness
 Journal of the Royal Statistical Society Series B
, 2007
"... Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive dis ..."
Abstract

Cited by 116 (23 self)
 Add to MetaCart
(Show Context)
Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with crossvalidation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.
Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation
 MONTHLY WEATHER REVIEW VOLUME
, 2005
"... Ensemble prediction systems typically show positive spreaderror correlation, but they are subject to forecast bias and dispersion errors, and are therefore uncalibrated. This work proposes the use of ensemble model output statistics (EMOS), an easytoimplement postprocessing technique that address ..."
Abstract

Cited by 81 (14 self)
 Add to MetaCart
Ensemble prediction systems typically show positive spreaderror correlation, but they are subject to forecast bias and dispersion errors, and are therefore uncalibrated. This work proposes the use of ensemble model output statistics (EMOS), an easytoimplement postprocessing technique that addresses both forecast bias and underdispersion and takes into account the spreadskill relationship. The technique is based on multiple linear regression and is akin to the superensemble approach that has traditionally been used for deterministicstyle forecasts. The EMOS technique yields probabilistic forecasts that take the form of Gaussian predictive probability density functions (PDFs) for continuous weather variables and can be applied to gridded model output. The EMOS predictive mean is a biascorrected weighted average of the ensemble member forecasts, with coefficients that can be interpreted in terms of the relative contributions of the member models to the ensemble, and provides a highly competitive deterministicstyle forecast. The EMOS predictive variance is a linear function of the ensemble variance. For fitting the EMOS coefficients, the method of minimum continuous ranked probability score (CRPS) estimation is introduced. This technique finds the coefficient values that optimize the CRPS for the training data. The EMOS technique was applied to 48h forecasts of sea level pressure and surface temperature over the North American Pacific Northwest in spring 2000, using the University of Washington mesoscale ensemble. When compared to the biascorrected ensemble, deterministicstyle EMOS forecasts of sea level pressure had rootmeansquare error 9 % less and mean absolute error 7 % less. The EMOS predictive PDFs were sharp, and much better calibrated than the raw ensemble or the biascorrected ensemble.
du Preez, “Applicationindependent evaluation of speaker detection
 Computer Speech and Language
, 2006
"... We present a Bayesian analysis of the evaluation of speaker detection performance. We use expectation of utility to confirm that likelihoodratio is both an optimum and applicationindependent form of output for speaker detection systems. We point out that the problem of likelihoodratio calculation ..."
Abstract

Cited by 79 (3 self)
 Add to MetaCart
(Show Context)
We present a Bayesian analysis of the evaluation of speaker detection performance. We use expectation of utility to confirm that likelihoodratio is both an optimum and applicationindependent form of output for speaker detection systems. We point out that the problem of likelihoodratio calculation is equivalent to the problem of optimization of decision thresholds. It is shown that the decision cost that is used in the existing NIST evaluations effectively forms a utility (a proper scoring rule) for the evaluation of the quality of likelihoodratio presentation. As an alternative, a logarithmic utility (a strictly proper scoring rule) is proposed. Finally, an informationtheoretic interpretation of the expected logarithmic utility is given. It is hoped that this analysis and the proposed evaluation method will promote the use of likelihoodratio detector output rather than decision output. 1.
Calibrated probabilistic forecasting at the Stateline wind energy center: The regimeswitching spacetime (RST) method
 Journal of the American Statistical Association
, 2004
"... With the global proliferation of wind power, accurate shortterm forecasts of wind resources at wind energy sites are becoming paramount. Regimeswitching spacetime (RST) models merge meteorological and statistical expertise to obtain accurate and calibrated, fully probabilistic forecasts of wind s ..."
Abstract

Cited by 35 (14 self)
 Add to MetaCart
(Show Context)
With the global proliferation of wind power, accurate shortterm forecasts of wind resources at wind energy sites are becoming paramount. Regimeswitching spacetime (RST) models merge meteorological and statistical expertise to obtain accurate and calibrated, fully probabilistic forecasts of wind speed and wind power. The model formulation is parsimonious, yet takes account of all the salient features of wind speed: alternating atmospheric regimes, temporal and spatial correlation, diurnal and seasonal nonstationarity, conditional heteroscedasticity, and nonGaussianity. The RST method identifies forecast regimes at the wind energy site and fits a conditional predictive model for each regime. Geographically dispersed meteorological observations in the vicinity of the wind farm are used as offsite predictors. The RST technique was applied to 2hour ahead forecasts of hourly average wind speed at the Stateline wind farm in the US Pacific Northwest. In July 2003, for instance, the RST forecasts had rootmeansquare error (RMSE) 28.6 % less than the persistence forecasts. For each month in the test period, the RST forecasts had lower RMSE than forecasts using stateoftheart vector time series techniques. The RST method provides probabilistic forecasts in the form of
Geostatistical SpaceTime Models, Stationarity, Separability and Full Symmetry
"... Geostatistical approaches to modeling spatiotemporal data rely on parametric covariance models and rather stringent assumptions, such as stationarity, separability and full symmetry. This paper reviews recent advances in the literature on spacetime covariance functions in light of the aforemention ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
Geostatistical approaches to modeling spatiotemporal data rely on parametric covariance models and rather stringent assumptions, such as stationarity, separability and full symmetry. This paper reviews recent advances in the literature on spacetime covariance functions in light of the aforementioned notions, which are illustrated using wind data from Ireland. Experiments with timeforward kriging predictors suggest that the use of more complex and more realistic covariance models results in improved predictive performance.
D (2006) Model error in weather and climate forecasting. In: Palmer T, Hagedorn R (eds) Predictability of weather and climate. Cambridge University Press, Cambridge Anderson JL (2001) An ensemble adjustment Kalman filter for data assimilation. Mon Weather
 Cliffs, NJ Bengtsson T, Snyder C, Nychka D
, 1999
"... “As if someone were to buy several copies of the morning newspaper to assure himself that what it said was true ” Ludwig Wittgenstein 1 ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
(Show Context)
“As if someone were to buy several copies of the morning newspaper to assure himself that what it said was true ” Ludwig Wittgenstein 1
2010: Quantifying uncertainty in climate change science through empirical information theory
 Proc
"... Information theory provides a concise systematic framework for measuring climate consistency and sensitivity for imperfect models. A suite of increasingly complex physically relevant linear Gaussian models with time periodic features mimicking the seasonal cycle is utilized to elucidate central iss ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Information theory provides a concise systematic framework for measuring climate consistency and sensitivity for imperfect models. A suite of increasingly complex physically relevant linear Gaussian models with time periodic features mimicking the seasonal cycle is utilized to elucidate central issues that arise in contemporary climate science. These include the role of model error, thememory of initial conditions, and effects of coarse graining in producing short, medium, and longrange forecasts. In particular, this study demonstrates how relative entropy can be used to improve climate consistency of an overdamped imperfect model by inflating stochastic forcing. Moreover, the authors show that, in the considered models, by improving climate consistency, this simultaneously increases the predictive skill of an imperfect model in response to external perturbation, a property of crucial importance in the context of climate change. The three models range in complexity from a scalar time periodic model mimicking seasonal fluctuations in a mean jet to a spatially extended system of turbulent Rossby waves to, finally, the behavior of a turbulent tracer with a mean gradient with the background turbulent field velocity generated by the first two models. This last model mimics the global and regional behavior of turbulent passive tracers under various climate change scenarios. This detailed study provides important guidelines for extending these strategies to more complicated and nonGaussian physical systems. 1.
Multi model ensembling of streamflow forecasts: Role of predictor state in developing optimal combinations.” Water Resour
 Res
, 2008
"... [1] A new approach for developing multimodel streamflow forecasts is presented. The methodology combines streamflow forecasts from individual models by evaluating their skill, represented by rank probability score (RPS), contingent on the predictor state. Using average RPS estimated over the chosen ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
(Show Context)
[1] A new approach for developing multimodel streamflow forecasts is presented. The methodology combines streamflow forecasts from individual models by evaluating their skill, represented by rank probability score (RPS), contingent on the predictor state. Using average RPS estimated over the chosen neighbors in the predictor state space, the methodology assigns higher weights for a model that has better predictability under similar predictor conditions. We assess the performance of the proposed algorithm by developing multimodel streamflow forecasts for Falls Lake Reservoir in the Neuse River Basin, North Carolina (NC), by combining streamflow forecasts developed from two lowdimensional statistical models that use seasurface temperature conditions as underlying predictors. To evaluate the proposed scheme thoroughly, we consider a total of seven multimodels that include existing multimodel combination techniques such as combining based on longterm predictability of individual models and by simple pooling of ensembles. Detailed nonparametric hypothesis tests comparing the performance of seven multimodels with two individual models show that the reduced RPS from multimodel forecasts developed using the proposed algorithm is statistically significant from the RPSs