Results 1  10
of
1,918
Logloss games with bounded adversaries
, 2014
"... The logloss game, of the form RNML = min P ..."
MIXABILITY IS BAYES RISK CURVATURE RELATIVE TO LOG LOSS
"... Given K codes, a standard result from source coding tells us how to design a single universal code with codelengths within log(K) bits of the best code, on any data sequence. Translated to the online learning setting of prediction with expert advice, this result implies that for logarithmic loss one ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
Given K codes, a standard result from source coding tells us how to design a single universal code with codelengths within log(K) bits of the best code, on any data sequence. Translated to the online learning setting of prediction with expert advice, this result implies that for logarithmic loss
Bounds on Individual Risk for Logloss Predictors
"... In sequential prediction with logloss as well as density estimation with risk measured by KL divergence, one is often interested in the expected instantaneous loss, or, equivalently, the individual risk at a given fixed sample size n. For Bayesian prediction and estimation methods, it is often easy ..."
Abstract
 Add to MetaCart
In sequential prediction with logloss as well as density estimation with risk measured by KL divergence, one is often interested in the expected instantaneous loss, or, equivalently, the individual risk at a given fixed sample size n. For Bayesian prediction and estimation methods, it is often
Asymptotic Logloss of Prequential Maximum Likelihood Codes
"... P and M, we find c = 1 for Bayes, NML and 2part codes, whereasfor the prequential ML codes, we can get any ..."
Abstract
 Add to MetaCart
P and M, we find c = 1 for Bayes, NML and 2part codes, whereasfor the prequential ML codes, we can get any
The Dantzig selector: statistical estimation when p is much larger than n
, 2005
"... In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y = Ax + z, where x ∈ R p is a parameter vector of interest, A is a data matrix with possibly far fewer rows than columns, n ≪ ..."
Abstract

Cited by 879 (14 self)
 Add to MetaCart
‖ˆx − x ‖ 2 ℓ2 ≤ C2 ( · 2 log p · σ 2 + ∑ min(x 2 i, σ 2) Our results are nonasymptotic and we give values for the constant C. In short, our estimator achieves a loss within a logarithmic factor of the ideal mean squared error one would achieve with an oracle which would supply perfect information
Minimax Regret under Log Loss for General Classes of Experts
, 1999
"... We study sequential strategies for assigning probabilities to the elements that may appear next in a sequence of data. The goal is to minimize the regret under log loss over the worst possible sequence. That is, to minimize the worstcase drop in the loglikelihood of the final sequence when measure ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We study sequential strategies for assigning probabilities to the elements that may appear next in a sequence of data. The goal is to minimize the regret under log loss over the worst possible sequence. That is, to minimize the worstcase drop in the loglikelihood of the final sequence when
Worst Case Prediction over Sequences under Log Loss
 In The Mathematics of Information Coding, Extraction, and Distribution
, 1997
"... . We consider the game of sequentially assigning probabilities to future data based on past observations under logarithmic loss. We are not making probabilistic assumptions about the generation of the data, but consider a situation where a player tries to minimize his loss relative to the loss of th ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
. We consider the game of sequentially assigning probabilities to future data based on past observations under logarithmic loss. We are not making probabilistic assumptions about the generation of the data, but consider a situation where a player tries to minimize his loss relative to the loss
Rooij. Asymptotic logloss of prequential maximum likelihood codes
 In Conference on Learning Theory (COLT 2005
, 2005
"... We analyze the DawidRissanen prequential maximum likelihood codes relative to oneparameter exponential family models M. If data are i.i.d. according to an (essentially) arbitrary P, then the redundancy grows at rate 1 2c lnn. We show that c = σ2 1/σ2 2, where σ2 1 is the variance of P, and σ2 2 is ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
We analyze the DawidRissanen prequential maximum likelihood codes relative to oneparameter exponential family models M. If data are i.i.d. according to an (essentially) arbitrary P, then the redundancy grows at rate 1 2c lnn. We show that c = σ2 1/σ2 2, where σ2 1 is the variance of P, and σ2 2 is the variance of the distribution M ∗ ∈ M that is closest to P in KL divergence. This shows that prequential codes behave quite differently from other important universal codes such as the 2part MDL, Shtarkov and Bayes codes, for which c = 1. This behavior is undesirable in an MDL model selection setting. 1
Horizonindependent optimal prediction with logloss in exponential families
 Proc. COLT2013
, 2013
"... We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable
How to Use Expert Advice
 JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract

Cited by 377 (79 self)
 Add to MetaCart
bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
Results 1  10
of
1,918