Results 1  10
of
26
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Composite multiclass losses.
 In Neural Information Processing Systems,
, 2011
"... Abstract We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a "proper composite loss", which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We subsum ..."
Abstract

Cited by 21 (8 self)
 Add to MetaCart
Abstract We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a "proper composite loss", which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We subsume results on "classification calibration" by relating it to properness. We determine the stationarity condition, Bregman representation, ordersensitivity, and quasiconvexity of multiclass proper losses. We then characterise the existence and uniqueness of the composite representation for multiclass losses. We show how the composite representation is related to other core properties of a loss: mixability, admissibility and (strong) convexity of multiclass losses which we characterise in terms of the Hessian of the Bayes risk. We show that the simple integral representation for binary proper losses can not be extended to multiclass losses but offer concrete guidance regarding how to design different loss functions. The conclusion drawn from these results is that the proper composite representation is a natural and convenient tool for the design of multiclass loss functions.
Supermartingales in Prediction with Expert Advice
 ALT 2008 Proceedings, LNCS(LNAI
, 2008
"... ar ..."
(Show Context)
Elicitation and evaluation of statistical forecasts
, 2010
"... This paper studies mechanisms for eliciting and evaluating statistical forecasts. Nature draws a state at random from a given state space, according to some distribution p. Prior to Nature’s move, a forecaster, who knows p, provides a prediction for a given statistic of p. The mechanism defines the ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
This paper studies mechanisms for eliciting and evaluating statistical forecasts. Nature draws a state at random from a given state space, according to some distribution p. Prior to Nature’s move, a forecaster, who knows p, provides a prediction for a given statistic of p. The mechanism defines the forecaster’s payoff as a function of the prediction and the subsequently realized state. When the statistic is continuous with a continuum of values, the payoffs that provide strict incentives to the forecaster exist if and only if the statistic partitions the set of distributions into convex subsets. When the underlying state space is finite, and the statistic takes values in a finite set, these payoffs exist if and only if the partition forms a linear crosssection of a Voronoi diagram—that is, if the partition forms a power diagram—a stronger condition than convexity. In both cases, the payoffs can be fully characterized essentially as weighted averages of base functions. Preliminary versions appear in the proceedings of the 9 th and 10 th ACM Conference on Electronic
Proper local scoring rules on discrete sample spaces
 The Annals of Statistics
"... A scoring rule is a loss function measuring the quality of a quoted probability distribution Q for a random variable X, in the light of the realized outcome x of X; it is proper if the expected score, under any distribution P for X, is minimized by quoting Q = P. Using the fact that any differentia ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
A scoring rule is a loss function measuring the quality of a quoted probability distribution Q for a random variable X, in the light of the realized outcome x of X; it is proper if the expected score, under any distribution P for X, is minimized by quoting Q = P. Using the fact that any differentiable proper scoring rule on a finite sample space X is the gradient of a concave homogeneous function, we consider when such a rule can be local in the sense of depending only on the probabilities quoted for points in a nominated neighborhood of x. Under mild conditions, we characterize such a proper local scoring rule in terms of a collection of homogeneous functions on the cliques of an undirected graph on the spaceX. A useful property of such rules is that the quoted distribution Q need only be known up to a scale factor. Examples of the use of such scoring rules include Besag’s pseudolikelihood and Hyvärinen’s method of ratio matching. 1. Introduction. Let X be a finite set, let A be the set of real vectors α = (αx:x ∈ X) with each αx> 0, and let P = {p ∈ A:∑x px = 1} be the set of such
Strictly Proper Scoring Rules
 Prediction, and Estimation, J. Am. Stat. Assoc
"... ..."
(Show Context)
Local Scoring Rules: A Versatile Tool for Inference
"... In many applications of highly structured statistical models the likelihood function is intractable; in particular, finding the normalisation constant of the distribution can be demanding. One way to sidestep this problem is to to adopt composite likelihood methods, such as the pseudolikelihood a ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
In many applications of highly structured statistical models the likelihood function is intractable; in particular, finding the normalisation constant of the distribution can be demanding. One way to sidestep this problem is to to adopt composite likelihood methods, such as the pseudolikelihood approach. In this paper we display composite likelihood as a special case of a general estimation technique based on proper scoring rules, which supply an unbiased estimating equation for any statistical model. The important class of key local scoring rules avoids the need to compute normalising constants. Another application arises in Bayesian model selection. The log Bayes factor measures by how much the predictive log score for one model improves on that for another. However, Bayes factors are not welldefined when improper prior distributions are used. If we replace the log score by a suitable local proper scoring rule, these problems are avoided.