Results 1  10
of
31
Minimum complexity density estimation
 IEEE TRANS. INF. THEORY
, 1991
"... The minimum complexity or minimum descriptionlength criterion developed by Kolmogorov, Rissanen, Wallace, So&in, and others leads to consistent probability density estimators. These density estimators are defined to achieve the best compromise between likelihood and simplicity. A related issue ..."
Abstract

Cited by 248 (7 self)
 Add to MetaCart
The minimum complexity or minimum descriptionlength criterion developed by Kolmogorov, Rissanen, Wallace, So&in, and others leads to consistent probability density estimators. These density estimators are defined to achieve the best compromise between likelihood and simplicity. A related issue is the compromise between accuracy of approximations and complexity relative to the sample size. An index of resolvability is studied which is shown to bound the statistical accuracy of the density estimators, as well as the informationtheoretic redundancy.
InformationTheoretic Determination of Minimax Rates of Convergence
 Ann. Stat
, 1997
"... In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence. ..."
Abstract

Cited by 158 (24 self)
 Add to MetaCart
In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
Smooth Discrimination Analysis
 Ann. Statist
, 1998
"... Discriminant analysis for two data sets in IR d with probability densities f and g can be based on the estimation of the set G = fx : f(x) g(x)g. We consider applications where it is appropriate to assume that the region G has a smooth boundary. In particular, this assumption makes sense if di ..."
Abstract

Cited by 154 (3 self)
 Add to MetaCart
Discriminant analysis for two data sets in IR d with probability densities f and g can be based on the estimation of the set G = fx : f(x) g(x)g. We consider applications where it is appropriate to assume that the region G has a smooth boundary. In particular, this assumption makes sense if discriminant analysis is used as a data analytic tool. We discuss optimal rates for estimation of G. 1991 AMS: primary 62G05 , secondary 62G20 Keywords and phrases: discrimination analysis, minimax rates, Bayes risk Short title: Smooth discrimination analysis This research was supported by the Deutsche Forschungsgemeinschaft, Sonderforschungsbereich 373 "Quantifikation und Simulation okonomischer Prozesse", HumboldtUniversitat zu Berlin 1 Introduction Assume that one observes two independent samples X = (X 1 ; : : : ; X n ) and Y = (Y 1 ; : : : ; Ym ) of IR d valued i.i.d. observations with densities f or g, respectively. The densities f and g are unknown. An additional random variabl...
Optimal pointwise adaptive methods in nonparametric estimation
 ANN. STATIST
, 1997
"... The problem of optimal adaptive estimation of a function at a given point from noisy data is considered. Two procedures are proved to be asymptotically optimal for different settings. First we study the problem of bandwidth selection for nonparametric pointwise kernel estimation with a given kernel. ..."
Abstract

Cited by 67 (11 self)
 Add to MetaCart
The problem of optimal adaptive estimation of a function at a given point from noisy data is considered. Two procedures are proved to be asymptotically optimal for different settings. First we study the problem of bandwidth selection for nonparametric pointwise kernel estimation with a given kernel. We propose a bandwidth selection procedure and prove its optimality in the asymptotic sense. Moreover, this optimality is stated not only among kernel estimators with a variable bandwidth. The resulting estimator is asymptotically optimal among all feasible estimators. The important feature of this procedure is that it is fully adaptive and it “works” for a very wide class of functions obeying a mild regularity restriction. With it the attainable accuracy of estimation depends on the function itself and is expressed in terms of the “ideal adaptive bandwidth” corresponding to this function and a given kernel. The second procedure can be considered as a specialization of the first one under the qualitative assumption that the function to be estimated belongs to some Hölder class ��β � L � with unknown parameters β � L. This assumption allows us to choose a family of kernels in an optimal way and the resulting procedure appears to be asymptotically optimal in the adaptive sense in any range of adaptation with β ≤ 2.
Fast learning rates in statistical inference through aggregation
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2008
"... We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set G up to the smallest possible additive term, called the convergence rate. When the reference set is finite and when n denotes the size of the training data, w ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set G up to the smallest possible additive term, called the convergence rate. When the reference set is finite and when n denotes the size of the training data, we provide minimax convergence rates of the form C () log G  v with tight evaluation of the positive constant C and with n exact 0 < v ≤ 1, the latter value depending on the convexity of the loss function and on the level of noise in the output distribution. The risk upper bounds are based on a sequential randomized algorithm, which at each step concentrates on functions having both low risk and low variance with respect to the previous step prediction function. Our analysis puts forward the links between the probabilistic and worstcase viewpoints, and allows to obtain risk bounds unachievable with the standard statistical learning approach. One of the key idea of this work is to use probabilistic inequalities with respect to appropriate (Gibbs) distributions on the prediction function space instead of using them with respect to the distribution generating the data. The risk lower bounds are based on refinements of the Assouad lemma taking particularly into account the properties of the loss function. Our key example to illustrate the upper and lower bounds is to consider the Lqregression setting for which an exhaustive analysis of the convergence rates is given while q ranges in [1; +∞[.
On minimax density estimation on R
 Bernoulli
, 2004
"... Abstract: the problem of density estimation on R from an independent sample X1,...XN with common density f is concerned. The behavior of the minimax Lprisk, 1 ≤ p ≤ ∞, is studied when f belongs to a Hölder class of regularity s on the real line. The lower bound for the minimax risk is provided. We ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
Abstract: the problem of density estimation on R from an independent sample X1,...XN with common density f is concerned. The behavior of the minimax Lprisk, 1 ≤ p ≤ ∞, is studied when f belongs to a Hölder class of regularity s on the real line. The lower bound for the minimax risk is provided. We show that the linear estimator is not efficient in this setting and construct a wavelet adaptive estimator which attains (up to a logarithmic factor in N) the lower bounds involved. We show that the minimax risk depends on the parameter p when p < 2 + 1 s. Key words: nonparametric density estimation, minimax estimation, adaptive estimation. 1
Minimax nonparametric classification  Part I: Rates of convergence

, 1998
"... This paper studies minimax aspects of nonparametric classification. We first study minimax estimation of the conditional probability of a class label, given the feature variable. This function, say f � is assumed to be in a general nonparametric class. We show the minimax rate of convergence under ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
This paper studies minimax aspects of nonparametric classification. We first study minimax estimation of the conditional probability of a class label, given the feature variable. This function, say f � is assumed to be in a general nonparametric class. We show the minimax rate of convergence under square L 2 loss is determined by the massiveness of the class as measured by metric entropy. The second part of the paper studies minimax classification. The loss of interest is the difference between the probability of misclassification of a classifier and that of the Bayes decision. As is wellknown, an upper bound on risk for estimating f gives an upper bound on the risk for classification, but the rate is known to be suboptimal for the class of monotone functions. This suggests that one does not have to estimate f well in order to classify well. However, we show that the two problems are in fact of the same difficulty in terms of rates of convergence under a sufficient condition, which is satisfied by many function classes including Besov (Sobolev), Lipschitz, and bounded variation. This is somewhat surprising in view of a result of Devroye, Györfi, and Lugosi (1996).
ADAPTIVE NONPARAMETRIC CONFIDENCE SETS
, 2006
"... We construct honest confidence regions for a Hilbert spacevalued parameter in various statistical models. The confidence sets can be centered at arbitrary adaptive estimators, and have diameter which adapts optimally to a given selection of models. The latter adaptation is necessarily limited in sc ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
We construct honest confidence regions for a Hilbert spacevalued parameter in various statistical models. The confidence sets can be centered at arbitrary adaptive estimators, and have diameter which adapts optimally to a given selection of models. The latter adaptation is necessarily limited in scope. We review the notion of adaptive confidence regions, and relate the optimal rates of the diameter of adaptive confidence regions to the minimax rates for testing and estimation. Applications include the finite normal mean model, the white noise model, density estimation and regression with random design.
Near optimal thresholding estimation of a Poisson intensity on the real line
, 810
"... Abstract The purpose of this paper is to estimate the intensity of a Poisson process N by using thresholding rules. In this paper, the intensity, defined as the derivative of the mean measure of N with respect to ndx where n is a fixed parameter, is assumed to be noncompactly supported. The estimat ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
Abstract The purpose of this paper is to estimate the intensity of a Poisson process N by using thresholding rules. In this paper, the intensity, defined as the derivative of the mean measure of N with respect to ndx where n is a fixed parameter, is assumed to be noncompactly supported. The estimator ˜ fn,γ based on random thresholds is proved to achieve the same performance as the oracle estimator up to a possible logarithmic term. Then, minimax properties of ˜ fn,γ on Besov spaces B α p,q are established. Under mild assumptions, we prove that sup f∈Bα E( p,q∩L∞ ˜ fn,γ − f  2 2) ≤ C log n
Maxisets for density estimation on R
 Math. Methods of Statist., n
, 2006
"... The problem of density estimation on R is considered. Adopting the maxiset point of view, we focus on performance of adaptive procedures. Any rule which consists in neglecting the wavelet empirical coefficients smaller than a sequence of thresholds vn will be called an elitist rule. We prove that fo ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
The problem of density estimation on R is considered. Adopting the maxiset point of view, we focus on performance of adaptive procedures. Any rule which consists in neglecting the wavelet empirical coefficients smaller than a sequence of thresholds vn will be called an elitist rule. We prove that for such a procedure the maximal space for the rate v αp n, with 0 < α < 1, is always contained in the intersection of a Besov space and a weak Besov space. With no assumption on compactness of the support of the density goal f, we show that the hard thresholding rule is the best procedure among elitist rules when taking the classical choice of thresholds vn √ = µ n−1 log(n), with µ> 0. Then, we point out the significance of datadriven thresholds in density estimation by comparing the maxiset of the hard thresholding rule with the one of Juditsky and LambertLacroix’s procedure.