Results 1 - 10
of
136
Model Selection and the Principle of Minimum Description Length
- Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract
-
Cited by 114 (4 self)
- Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can co-exist and be compared. We illustrate th...
Information-Theoretic Determination of Minimax Rates of Convergence
- Ann. Stat
, 1997
"... In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence. ..."
Abstract
-
Cited by 67 (18 self)
- Add to MetaCart
In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
On choosing and bounding probability metrics
- Internat. Statist. Rev. (2002
"... Abstract. When studying convergence of measures, an important issue is the choice of probability metric. We provide a summary and some new results concerning bounds among some important probability metrics/distances that are used by statisticians and probabilists. Knowledge of other metrics can prov ..."
Abstract
-
Cited by 54 (2 self)
- Add to MetaCart
Abstract. When studying convergence of measures, an important issue is the choice of probability metric. We provide a summary and some new results concerning bounds among some important probability metrics/distances that are used by statisticians and probabilists. Knowledge of other metrics can provide a means of deriving bounds for another one in an applied problem. Considering other metrics can also provide alternate insights. We also give examples that show that rates of convergence can strongly depend on the metric chosen. Careful consideration is necessary when choosing a metric. Abrégé. Le choix de métrique de probabilité est une décision très importante lorsqu’on étudie la convergence des mesures. Nous vous fournissons avec un sommaire de plusieurs métriques/distances de probabilité couramment utilisées par des statisticiens(nes) at par des probabilistes, ainsi que certains nouveaux résultats qui se rapportent à leurs bornes. Avoir connaissance d’autres métriques peut vous fournir avec un moyen de dériver des bornes pour une autre métrique dans un problème appliqué. Le fait de prendre en considération plusieurs métriques vous permettra d’approcher des problèmes d’une manière différente. Ainsi, nous vous démontrons que les taux de convergence peuvent dépendre de façon importante sur votre choix de métrique. Il est donc important de tout considérer lorsqu’on doit choisir une métrique. 1.
Asymptotic Equivalence of Density Estimation and Gaussian White Noise
- Ann. Statist
, 1996
"... Signal recovery in Gaussian white noise with variance tending to zero has served for some time as a representative model for nonparametric curve estimation, having all the essential traits in a pure form. The equivalence has mostly been stated informally, but an approximation in the sense of Le Cam' ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
Signal recovery in Gaussian white noise with variance tending to zero has served for some time as a representative model for nonparametric curve estimation, having all the essential traits in a pure form. The equivalence has mostly been stated informally, but an approximation in the sense of Le Cam's deficiency distance \Delta would make it precise. The models are then asymptotically equivalent for all purposes of statistical decision with bounded loss. In nonparametrics, a first result of this kind has recently been established for Gaussian regression (Brown and Low, 1993). We consider the analogous problem for the experiment given by n i. i. d. observations having density f on the unit interval. Our basic result concerns the parameter space of densities which are in a Holder ball with exponent ff ? 1 2 and and which are uniformly bounded away from zero. We show that an i. i. d. sample of size n with density f is globally asymptotically equivalent to a white noise experiment with dri...
Generalized weighted Chinese restaurant processes for species sampling mixture models
- Statistica Sinica
, 2003
"... Abstract: The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conj ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
Abstract: The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction with Pitman (1995, 1996), we derive characterizations of the posterior distribution in terms of a posterior partition distribution that extend the results of Lo (1984) for the Dirichlet process. These results provide a better understanding of models and have both theoretical and practical applications. To facilitate the use of our models we generalize the work in Brunner, Chan, James and Lo (2001) by extending their weighted Chinese restaurant (WCR) Monte Carlo procedure, an i.i.d. sequential importance sampling (SIS) procedure for approximating posterior mean functionals based on the Dirichlet process, to the case of approximation of mean functionals and additionally their posterior laws in species sampling mixture models. We also discuss collapsed Gibbs sampling, Pólya urn Gibbs sampling and a Pólya urn SIS scheme. Our framework allows for numerous applications, including multiplicative counting process models subject to weighted gamma processes, as well as nonparametric and semiparametric hierarchical models based on the Dirichlet process, its two-parameter extension, the Pitman-Yor process and finite dimensional Dirichlet priors. Key words and phrases: Dirichlet process, exchangeable partition, finite dimensional Dirichlet prior, two-parameter Poisson-Dirichlet process, prediction rule, random probability measure, species sampling sequence.
Random Regular Graphs: Asymptotic Distributions And Contiguity
- Combinatorics, Probability and Computing
, 1993
"... . The asymptotic distribution of the number of Hamilton cycles in a random regular graph is determined. The limit distribution is of an unusual type; it is the distribution of a variable whose logarithm can be written as an infinite linear combination of independent Poisson variables, and thus the l ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
. The asymptotic distribution of the number of Hamilton cycles in a random regular graph is determined. The limit distribution is of an unusual type; it is the distribution of a variable whose logarithm can be written as an infinite linear combination of independent Poisson variables, and thus the logarithm has an infinitely divisible distribution with a certain discrete L'evy measure. Similar results are found for some related problems. These limit results imply that some different models of random regular graphs are contiguous, which means that they are qualitatively asymptotically equivalent. For example, if r 3, then the usual (uniformly distributed) random r-regular graph is contiguous to the one constructed by taking the union of r perfect matchings on the same vertex set (assumed to be of even cardinality), conditioned on there being no multiple edges. Some consequences of contiguity for asymptotic distributions are discussed. 0. Introduction In two remarkable papers, Robinso...
Mutual Information, Metric Entropy, and Cumulative Relative Entropy Risk
- Annals of Statistics
, 1996
"... Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time t between 1 and n, based on the observations Y 1 ; : : : ; Y t\Gamma1 , the statistician produces an estimated distribution P t for P ` , and suffers a loss L(P ` ; P t ). The cumulative risk for the statistician is the average total loss up to time n. Of special interest in information theory, data compression, mathematical finance, computational learning theory and statistical mechanics is the special case when the loss L(P ` ; P t ) is the relative entropy between the true distribution P ` and the estimated distribution P t . Here the cumulative Bayes risk from time 1 to n is the mutual information between the random parameter \Theta and the observations Y 1 ; : : : ;...
Convergence rates of posterior distributions
- Ann. Statist
, 2000
"... We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinite-dimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, log-spline models, D ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinite-dimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, log-spline models, Dirichlet processes and interval censoring. 1. Introduction. Suppose
On the Estimation of Quadratic Functionals
"... We discuss the difficulties of estimating quadratic functionals based on observations Y (t) from the white noise model Y (t) = Jf (u)du + cr W (t), t E [0,1], o where W (t) is a standard Wiener process on [0, 1]. The optimal rates of convergence (as cr-> 0) for estimating quadratic functionals unde ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
We discuss the difficulties of estimating quadratic functionals based on observations Y (t) from the white noise model Y (t) = Jf (u)du + cr W (t), t E [0,1], o where W (t) is a standard Wiener process on [0, 1]. The optimal rates of convergence (as cr-> 0) for estimating quadratic functionals under certain geometric constraints are 1 found. Specially, the optimal rates of estimating J[f (k)(x)f dx under hyperrectangular o constraints r = (J: Xj(f)::; CFP) and weighted lp-body constraints r p = (J: "Lj ' IXj(f)IP::; C) are computed explicitly, where Xj(f) is the jth Fourier-1 Bessel coefficient of the unknown function f. We invent a new method for developing lower bounds based on testing two highly composite hypercubes, and address its advantages. The attainable lower bounds are found by applying the hardest I-dimensional approach as well as the hypercube method. We demonstrate that for estimating regular quadratic functionals (Le., the functionals which can be estimated at rate 0 (cr 2», the difficulties of the estimation are captured by the hardest one dimensional subproblems and for estimating nonregular quadratic functionals (i.e. no 0 (cr1-consistent estimator exists), the difficulties are captured at certain finite dimensional (the dimension goes to infinite as cr-> 0) hypercube subproblems.

