MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Model Selection and the Principle of Minimum Description Length (2001) [68 citations — 3 self]

Download:
Download as a PDF
by Mark H. Hansen, Bin Yu
Journal of the American Statistical Association
http://cm.bell-labs.com/who/cocteau/papers/pdf/mdl.pdf
Add To MetaCart

Abstract:

This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we nd many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can co-exist and be compared. We illustrate the MDL principle by considering problems in regression, nonparametric curve estimation, cluster analysis, and time series analysis. Because model selection in linear regression is an extremely common problem that arises in many applications, we present detailed derivations of several MDL criteria in this context and discuss their properties through a number of examples. Our emphasis

Citations

4364 Elements of Information Theory – Cover, Thomas - 1991
1390 Introduction to the theory of neural computation – Hertz, Krogh, et al. - 1991
1083 Introduction to Kolmogorov Complexity and Its Applications – Li, Vitanyi - 1993
841 Estimating the dimension of a model – Schwarz - 1978
727 Spline Models for Observational Data – Wahba - 1990
699 Modeling by shortest data description – Rissanen - 1978
611 A new look at the statistical model identification – Akaike - 1974
574 Bayesian Theory – Bernardo, Smith - 1994
425 Bayes factors – Kass, Raftery - 1995
304 Three approaches to the quantitative definition of information – Kolmogorov - 1965
280 A universal prior for integers and estimation by minimum description length – Rissanen - 1983
262 Time Series: Theory and Methods – Brockwell, Davis - 1998
253 Regression shrinkage and selection via the lasso – Tibshirani - 1995
227 Spline Functions: Basic Theory – Schumaker - 1981
216 Stochastic complexity in statistical inquiry – Rissanen - 1989
210 An information measure for classification – Wallace, Boulton - 1968
190 Stochastic complexity and modeling – Rissanen - 1986
188 Constructing simple stable description for image partitioning – Leclerc - 1989
178 Fisher Information and Stochastic Complexity – Rissanen - 1996
139 Variable Selection via Gibbs Sampling – George, McCulloch - 1993
106 Asymptotic Methods in Statistical Decision Theory – LeCam - 1986
92 Bounds on the sample complexity of Ba.yesian learning using information theory and the VC dimension – Haussler, Kearns, et al. - 1991
86 Some comments on C p – Mallows - 1973
85 A Monte Carlo approach to nonnormal and nonlinear state-space modeling – Carlin, Polson, et al. - 1992
80 Nonparametric regression using Bayesian variable selection – Smith, Kohn - 1996
75 The determination of the order of an autoregression – Hannan, Quinn - 1979
73 Multiple shrinkage and subset selection in wavelets – Clyde, Parmigiani, et al. - 1998
72 Flexible discriminant analysis by optimal scoring – Hastie, Tibshirani, et al. - 1994
71 Bayesian model averaging for linear regression models – Raftery, Madigan, et al. - 1997
69 Some comments on Cp – Mallows - 1973
68 Penalized discriminant analysis – Hastie, Buja, et al. - 1995
66 Information-theoretic asymptotics of Bayes methods – Clarke, Barron - 1990
65 The intrinsic Bayes factor for model selection and prediction – Berger, Pericchi - 1996
58 Regression and time series model selection in small samples. Biometrika 76:297–307 – Hurvich, Tsai - 1989
58 Logical basis for information theory and probability theory – Kolmogorov - 1968
57 On a measure of the information provided by an experiment – Lindley - 1956
55 Simultaneous Noise Suppression and Signal Compression using a library of orthonormal bases and the minimum description length criterion. To appear, Wavelets in Geophysics – Saito
53 Fractional Bayes factors for model comparison – O’Hagan - 1997
48 Calibration and empirical bayes variable selection – George, Foster - 1997
45 Regression by leaps and bounds – Furnival, Wilson - 1974
41 Approaches to Bayesian Variable Selection – GEORGE, MCCULLOCH - 1994
40 Prequential analysis, stochastic complexity and bayesian inference – Dawid - 1992
38 Hybrid adaptive splines – Luo, Wahba - 1997
37 An optimal selection of regression variables – Shibata - 1981
36 Bayes factors and choice criteria for linear models – Smith, Spiegelhalter - 1980
35 A Strong Version of the Redundancy-Capacity Theorem for Universal coding," accepted for publication – Merhav, Feder - 1994
34 Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion – Hurvich, Simonoff, et al. - 1998
31 A new look at the statistical model identi cation – Akaike - 1974
27 Stochastic complexity (with discussion – Rissanen - 1987
27 Density estimation by stochastic complexity – Rissanen, Speed, et al. - 1992