Results 1  10
of
104
Zanten. Rates of contraction of posterior distributions based on Gaussian process priors. The Annals of Statistics
"... We derive rates of contraction of posterior distributions on nonparametric or semiparametric models based on Gaussian processes. The rate of contraction is shown to depend on the position of the true parameter relative to the reproducing kernel Hilbert space of the Gaussian process and the small bal ..."
Abstract

Cited by 63 (4 self)
 Add to MetaCart
We derive rates of contraction of posterior distributions on nonparametric or semiparametric models based on Gaussian processes. The rate of contraction is shown to depend on the position of the true parameter relative to the reproducing kernel Hilbert space of the Gaussian process and the small ball probabilities of the Gaussian process. We determine these quantities for a range of examples of Gaussian priors and in several statistical settings. For instance, we consider the rate of contraction of the posterior distribution based on sampling from a smooth density model when the prior models the log density as a (fractionally integrated) Brownian motion. We also consider regression with Gaussian errors and smooth classification under a logistic or probit link function combined with various priors. 1. Introduction. Gaussian
Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities
 Ann. Statist
, 2001
"... We study the rates of convergence of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or locationscale mixtures of normal distributions with the scale parameter lying between two positive numbers. The true density is ..."
Abstract

Cited by 62 (10 self)
 Add to MetaCart
We study the rates of convergence of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or locationscale mixtures of normal distributions with the scale parameter lying between two positive numbers. The true density is also assumed to lie in this class with the true mixing distribution either compactly supported or having subGaussian tails. We obtain bounds for Hellinger bracketing entropies for this class, and from these bounds, we deduce the convergence rates of (sieve) MLEs in Hellinger distance. The rate turns out to be �log n � κ / √ n, where κ ≥ 1 is a constant that depends on the type of mixtures and the choice of the sieve. Next, we consider a Dirichlet mixture of normals as a prior on the unknown density. We estimate the prior probability of a certain KullbackLeibler type neighborhood and then invoke a general theorem that computes the posterior convergence rate in terms the growth rate of the Hellinger entropy and the concentration rate of the prior. The posterior distribution is also seen to converge at the rate �log n � κ / √ n in, where κ now depends on the tail behavior of the base measure of the Dirichlet process. 1. Introduction. A
Convergence rates of posterior distributions for noniid observations
 Ann. Statist
, 2007
"... We consider the asymptotic behavior of posterior distributions and Bayes estimators based on observations which are required to be neither independent nor identically distributed. We give general results on the rate of convergence of the posterior measure relative to distances derived from a testing ..."
Abstract

Cited by 57 (4 self)
 Add to MetaCart
(Show Context)
We consider the asymptotic behavior of posterior distributions and Bayes estimators based on observations which are required to be neither independent nor identically distributed. We give general results on the rate of convergence of the posterior measure relative to distances derived from a testing criterion. We then specialize our results to independent, nonidentically distributed observations, Markov processes, stationary Gaussian time series and the white noise model. We apply our general results to several examples of infinitedimensional statistical models including nonparametric regression with normal errors, binary regression, Poisson regression, an interval censoring model, Whittle estimation of the spectral density of a time series and a nonlinear autoregressive model.: θ ∈ Θ) be a sequence of statistical experiments with observations X (n), where the parameter set Θ is arbitrary and n is an indexing parameter, usually the sample size. We put a prior distribution Πn on θ ∈ Θ and study the rate of convergence of the posterior
The interplay of bayesian and frequentist analysis
 Statist. Sci
, 2004
"... Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the fi ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the fight has become considerably muted, with the recognition that each approach has a great deal to contribute to statistical practice and each is actually essential for full development of the other approach. In this article, we embark upon a rather idiosyncratic walk through some of these issues. Key words and phrases: Admissibility; Bayesian model checking; conditional frequentist; confidence intervals; consistency; coverage; design; hierarchical models; nonparametric
Posterior convergence rates of Dirichlet mixtures at smooth densities
 Ann. Statist
, 2007
"... We study the rates of convergence of the posterior distribution for Bayesian density estimation with Dirichlet mixtures of normal distributions as the prior. The true density is assumed to be twice continuously differentiable. The bandwidth is given a sequence of priors which is obtained by scaling ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
(Show Context)
We study the rates of convergence of the posterior distribution for Bayesian density estimation with Dirichlet mixtures of normal distributions as the prior. The true density is assumed to be twice continuously differentiable. The bandwidth is given a sequence of priors which is obtained by scaling a single prior by an appropriate order. In order to handle this problem, we derive a new general rate theorem by considering a countable covering of the parameter space whose prior probabilities satisfy a summability condition together with certain individual bounds on the Hellinger metric entropy. We apply this new general theorem on posterior convergence rates by computing bounds for Hellinger (bracketing) entropy numbers for the involved class of densities, the error in the approximation of a smooth density by normal mixtures and the concentration rate of the prior. The best obtainable rate of convergence of the posterior turns out to be equivalent to the wellknown frequentist rate for integrated mean squared error n −2/5 up to a logarithmic factor. 1. Introduction. Kernel
Posterior consistency of Gaussian process prior for nonparametric binary regression
"... Consider binary observations whose response probability is an unknown smooth function of a set of covariates. Suppose that a prior on the response probability function is induced by a Gaussian process mapped to the unit interval through a link function. In this paper we study consistency of the resu ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
(Show Context)
Consider binary observations whose response probability is an unknown smooth function of a set of covariates. Suppose that a prior on the response probability function is induced by a Gaussian process mapped to the unit interval through a link function. In this paper we study consistency of the resulting posterior distribution. If the covariance kernel has derivatives up to a desired order and the bandwidth parameter of the kernel is allowed to take arbitrarily small values, we show that the posterior distribution is consistent in the L1distance. As an auxiliary result to our proofs, we show that, under certain conditions, a Gaussian process assigns positive probabilities to the uniform neighborhoods of a continuous function. This result may be of independent interest in the literature for small ball probabilities of Gaussian processes. 1. Introduction. Consider
Dirichlet Process Mixtures of Generalized Linear Models
"... We propose Dirichlet Process mixtures of Generalized Linear Models (DPGLMs), a new method of nonparametric regression that accommodates continuous and categorical inputs, models a response variable locally by a generalized linear model. We give conditions for the existence and asymptotic unbiasedne ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
We propose Dirichlet Process mixtures of Generalized Linear Models (DPGLMs), a new method of nonparametric regression that accommodates continuous and categorical inputs, models a response variable locally by a generalized linear model. We give conditions for the existence and asymptotic unbiasedness of the DPGLM regression mean function estimate; we then give a practical example for when those conditions hold. We evaluate DPGLM on several data sets, comparing it to modern methods of nonparametric regression including regression trees and Gaussian processes. 1
Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth. The Annals of Statistics 37
, 2009
"... We consider nonparametric Bayesian estimation inference using a rescaled smooth Gaussian field as a prior for a multidimensional function. The rescaling is achieved using a Gamma variable and the procedure can be viewed as choosing an inverse Gamma bandwidth. The procedure is studied from a frequent ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
We consider nonparametric Bayesian estimation inference using a rescaled smooth Gaussian field as a prior for a multidimensional function. The rescaling is achieved using a Gamma variable and the procedure can be viewed as choosing an inverse Gamma bandwidth. The procedure is studied from a frequentist perspective in three statistical settings involving replicated observations (density estimation, regression and classification). We prove that the resulting posterior distribution shrinks to the distribution that generates the data at a speed which is minimaxoptimal up to a logarithmic factor, whatever the regularity level of the datagenerating distribution. Thus the hierachical Bayesian procedure, with a fixed prior, is shown to be fully adaptive. 1. Introduction. The
On rates of convergence for posterior distributions in infinitedimensional
 Ann. Statist
, 2007
"... This paper introduces a new approach to the study of rates of convergence for posterior distributions. It is a natural extension of a recent approach to the study of Bayesian consistency. In particular, we improve on current rates of convergence for models including the mixture of Dirichlet process ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
(Show Context)
This paper introduces a new approach to the study of rates of convergence for posterior distributions. It is a natural extension of a recent approach to the study of Bayesian consistency. In particular, we improve on current rates of convergence for models including the mixture of Dirichlet process model and the random Bernstein polynomial model. 1. Introduction. Recently
On the consistency of Bayesian variable selection for high dimensional binary regression and classification
 Neural Comput
, 2006
"... Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality K ≫ n. In this approach a suitable prior can be used to choose a few out of the many xj’s to model y, so that the posterior will propose probability densities p that are “often close ” to the true density p ∗ in some sense. The closeness can be described by a Hellinger distance between p and p ∗ that scales at a power very close to n −1/2, which is the “finitedimensional rate ” corresponding to a lowdimensional situation. These findings extend some recent work of Jiang [Technical Report 0502 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.