Results 1  10
of
10
Discrete MDL Predicts in Total Variation
, 2009
"... The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationa ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance. Implications for noni.i.d. domains like timeseries forecasting, discriminative learning, and reinforcement learning are discussed.
Artificial Curiosity for Autonomous Space Exploration
"... Curiosity is an essential driving force for science as well as technology, and has led mankind to explore its surroundings, all the way to our current understanding of the universe. Space science and exploration is at the pinnacle of each of these developments, in that it requires the most advanced ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Curiosity is an essential driving force for science as well as technology, and has led mankind to explore its surroundings, all the way to our current understanding of the universe. Space science and exploration is at the pinnacle of each of these developments, in that it requires the most advanced technology, explores our world and outer space, and constantly pushes the frontier of scientific knowledge. Manned space missions carry disproportionate costs and risks, so it is only natural for the field to strive for autonomous exploration. While recent innovations in engineering, robotics and AI provide solutions to many subproblems of autonomous exploration, insufficient emphasis has been on the higher level question of autonomously deciding what to explore. Artificial curiosity, the subject of this paper, precisely addresses this issue. We will introduce formal notions of “interestingness” based on the concepts of (1) compression progress through discovery of novel regularities in the observations, and (2) coherence progress through selection of data that “fits” the already known data in a compressionbased way. Further, we discuss how to construct a system that exhibits curiosity driven by the interestingness of certain types of novel observations, with the mission to curiously go where no probe has gone before. 1
Catching Up Faster by Switching Sooner: A predictive approach to adaptive estimation with an application to the AICBIC Dilemma
, 2011
"... Prediction and estimation based on Bayesian model selection and model averaging, and derived methods such as BIC, do not always converge at the fastest possible rate. We identify the catchup phenomenon as a novel explanation for the slow convergence of Bayesian methods, and use it to define a modif ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Prediction and estimation based on Bayesian model selection and model averaging, and derived methods such as BIC, do not always converge at the fastest possible rate. We identify the catchup phenomenon as a novel explanation for the slow convergence of Bayesian methods, and use it to define a modification of the Bayesian predictive distribution, called the switch distribution. When used as an adaptive estimator, the switch distribution does achieve optimal cumulative risk convergence rates in nonparametric density estimation and Gaussian regression problems. We show that the minimax cumulative risk is obtained under very weak conditions and without knowledge of the underlying degree of smoothness. Unlike other adaptive model selection procedures such as AIC and leaveoneout crossvalidation, BIC and Bayes factor model selection are typically statistically consistent. We show that this property is retained by the switch distribution, which thus solves the AICBIC dilemma for cumulative risk. The switch distribution has an efficient implementation. We compare its performance to AIC, BIC and Bayes on a regression problem with simulated data. 1
Recent Results in Universal and NonUniversal Induction
, 2006
"... We present and relate recent results in prediction based on countable classes of either probability (semi)distributions or base predictors. Learning by Bayes, MDL, and stochastic model selection will be considered as ..."
Abstract
 Add to MetaCart
We present and relate recent results in prediction based on countable classes of either probability (semi)distributions or base predictors. Learning by Bayes, MDL, and stochastic model selection will be considered as
Erratum to “Asymptotics of Discrete MDL for Online
"... The previously published abstract [1], erroneously stated that the work is about learning i.i.d. processes. But in fact, the main contribution are methods and proof techniques which work with arbitrary processes on a finite observation space. We regret any misunderstanding this might have caused. Th ..."
Abstract
 Add to MetaCart
The previously published abstract [1], erroneously stated that the work is about learning i.i.d. processes. But in fact, the main contribution are methods and proof techniques which work with arbitrary processes on a finite observation space. We regret any misunderstanding this might have caused. The corrected version of the Abstract should read as follows: Abstract—Minimum description length (MDL) is an important principle for induction and prediction, with strong relations to optimal Bayesian learning. This paper deals with learning processes which are not necessarily independent and identically distributed, by means of twopart MDL, where the underlying model class is countable. We consider the online learning framework, i.e., observations come in one by one, and the predictor is allowed to update its state of mind after each time step. We identify two ways of predicting by MDL for this setup, namely, a static and a dynamic