Results 1 - 10
of
372
Implementing approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations: A manual for the inla-program
, 2008
"... Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalised) linear models, (generalised) additive models, smoothing-spline models, state-space models, semiparametric regression, spatial and spatio-temp ..."
Abstract
-
Cited by 294 (20 self)
- Add to MetaCart
Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalised) linear models, (generalised) additive models, smoothing-spline models, state-space models, semiparametric regression, spatial and spatio-temporal models, log-Gaussian Cox-processes, geostatistical and geoadditive models. In this paper we consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form due to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, both in terms of convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations
Probabilistic forecasts, calibration and sharpness
- Journal of the Royal Statistical Society Series B
, 2007
"... Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive dis ..."
Abstract
-
Cited by 116 (23 self)
- Add to MetaCart
(Show Context)
Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with cross-validation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.
Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation
- MONTHLY WEATHER REVIEW VOLUME
, 2005
"... Ensemble prediction systems typically show positive spread-error correlation, but they are subject to forecast bias and dispersion errors, and are therefore uncalibrated. This work proposes the use of ensemble model output statistics (EMOS), an easy-to-implement postprocessing technique that address ..."
Abstract
-
Cited by 81 (14 self)
- Add to MetaCart
Ensemble prediction systems typically show positive spread-error correlation, but they are subject to forecast bias and dispersion errors, and are therefore uncalibrated. This work proposes the use of ensemble model output statistics (EMOS), an easy-to-implement postprocessing technique that addresses both forecast bias and underdispersion and takes into account the spread-skill relationship. The technique is based on multiple linear regression and is akin to the superensemble approach that has traditionally been used for deterministic-style forecasts. The EMOS technique yields probabilistic forecasts that take the form of Gaussian predictive probability density functions (PDFs) for continuous weather variables and can be applied to gridded model output. The EMOS predictive mean is a bias-corrected weighted average of the ensemble member forecasts, with coefficients that can be interpreted in terms of the relative contributions of the member models to the ensemble, and provides a highly competitive deterministic-style forecast. The EMOS predictive variance is a linear function of the ensemble variance. For fitting the EMOS coefficients, the method of minimum continuous ranked probability score (CRPS) estimation is introduced. This technique finds the coefficient values that optimize the CRPS for the training data. The EMOS technique was applied to 48-h forecasts of sea level pressure and surface temperature over the North American Pacific Northwest in spring 2000, using the University of Washington mesoscale ensemble. When compared to the bias-corrected ensemble, deterministic-style EMOS forecasts of sea level pressure had root-mean-square error 9 % less and mean absolute error 7 % less. The EMOS predictive PDFs were sharp, and much better calibrated than the raw ensemble or the bias-corrected ensemble.
A new understanding of prediction markets via no-regret learning
- In ACM EC
, 2010
"... We explore the striking mathematical connections that exist between market scoring rules, cost function based prediction markets, and no-regret learning. We first show that any cost function based prediction market can be interpreted as an algorithm for the commonly studied problem of learning from ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
We explore the striking mathematical connections that exist between market scoring rules, cost function based prediction markets, and no-regret learning. We first show that any cost function based prediction market can be interpreted as an algorithm for the commonly studied problem of learning from expert advice by equating the set of outcomes on which bets are placed in the market with the set of experts in the learning setting, and equating trades made in the market with losses observed by the learning algorithm. If the loss of the market organizer is bounded, this bound can be used to derive an O ( √ T) regret bound for the corresponding learning algorithm. We then show that the class of markets with convex cost functions exactly corresponds to the class of Follow the Regularized Leader learning algorithms, with the choice of a cost function in the market corresponding to the choice of a regularizer in the learning problem. Finally, we show an equivalence between market scoring rules and prediction markets with convex cost functions. This implies both that any market scoring rule can be implemented as a cost function based market maker, and that market scoring rules can be interpreted naturally as Follow the Regularized Leader algorithms. These connections provide new insight into how it is that commonly studied markets, such as the Logarithmic Market Scoring Rule, can aggregate opinions into accurate estimates of the likelihood of future events.
Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications,” manuscript, available at www-stat.wharton.upenn.edu/~buja
, 2005
"... What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: so-called “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisher-consistent manner. Proper scoring rules comprise most ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: so-called “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisher-consistent manner. Proper scoring rules comprise most loss functions currently in use: log-loss, squared error loss, boosting loss, and as limiting cases cost-weighted misclassification losses. Proper scoring rules have a rich structure: • Every proper scoring rules is a mixture (limit of sums) of cost-weighted misclassification losses. The mixture is specified by a weight function (or measure) that describes which misclassification cost weights are most emphasized by the proper scoring rule. • Proper scoring rules permit Fisher scoring and Iteratively Reweighted LS algorithms for model fitting. The weights are derived from a link function and the above weight function. • Proper scoring rules are in a 1-1 correspondence with information measures for tree-based classification.
Making and evaluating point forecasts
- Journal of the American Statistical Association
"... iv ..."
Comparing and Evaluating Bayesian Predictive Distributions of Asset Returns.
- Int. J. Forec.
, 2010
"... ..."
Information, Divergence and Risk for Binary Experiments
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify f-divergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROC-curves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract
-
Cited by 41 (8 self)
- Add to MetaCart
We unify f-divergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROC-curves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to cost-sensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating f-divergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Penalized loss functions for Bayesian model comparison
"... The deviance information criterion (DIC) is widely used for Bayesian model comparison, despite the lack of a clear theoretical foundation. DIC is shown to be an approximation to a penalized loss function based on the deviance, with a penalty derived from a cross-validation argument. This approximati ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
The deviance information criterion (DIC) is widely used for Bayesian model comparison, despite the lack of a clear theoretical foundation. DIC is shown to be an approximation to a penalized loss function based on the deviance, with a penalty derived from a cross-validation argument. This approximation is valid only when the effective number of parameters in the model is much smaller than the number of independent observations. In disease mapping, a typical application of DIC, this assumption does not hold and DIC under-penalizes more complex models. Another deviance-based loss function, derived from the same decision-theoretic framework, is applied to mixture models, which have previously been considered an unsuitable application for DIC.
Eliciting Properties of Probability Distributions
- In Proceedings of the ninth ACM conference on electronic commerce
, 2008
"... We investigate the problem of incentivizing an expert to truthfully reveal probabilistic information about a random event. Probabilistic information consists of one or more properties, which are any real-valued functions of the distribution, such as the mean and variance. Not all properties can be e ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
(Show Context)
We investigate the problem of incentivizing an expert to truthfully reveal probabilistic information about a random event. Probabilistic information consists of one or more properties, which are any real-valued functions of the distribution, such as the mean and variance. Not all properties can be elicited truthfully. We provide a simple characterization of elicitable properties, and describe the general form of the associated payment functions that induce truthful revelation. We then consider sets of properties, and observe that all properties can be inferred from sets of elicitable properties. This suggests the concept of elicitation complexity for a property, the size of the smallest set implying the property.