Results 1  10
of
41
Minimax Estimation via Wavelet Shrinkage
, 1992
"... We attempt to recover an unknown function from noisy, sampled data. Using orthonormal bases of compactly supported wavelets we develop a nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coe cients. The shrinkage can be tuned to be nearly minim ..."
Abstract

Cited by 321 (29 self)
 Add to MetaCart
We attempt to recover an unknown function from noisy, sampled data. Using orthonormal bases of compactly supported wavelets we develop a nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coe cients. The shrinkage can be tuned to be nearly minimax over any member of a wide range of Triebel and Besovtype smoothness constraints, and asymptotically minimax over Besov bodies with p q. Linear estimates cannot achieve even the minimax rates over Triebel and Besov classes with p <2, so our method can signi cantly outperform every linear method (kernel, smoothing spline, sieve,:::) in a minimax sense. Variants of our method based on simple threshold nonlinearities are nearly minimax. Our method possesses the interpretation of spatial adaptivity: it reconstructs using a kernel which mayvary in shape and bandwidth from point to point, depending on the data. Least favorable distributions for certain of the Triebel and Besov scales generate objects with sparse wavelet transforms. Many real objects have similarly sparse transforms, which suggests that these minimax results are relevant for practical problems. Sequels to this paper discuss practical implementation, spatial adaptation properties and applications to inverse problems.
Wavelet shrinkage: asymptopia
 Journal of the Royal Statistical Society, Ser. B
, 1995
"... Considerable e ort has been directed recently to develop asymptotically minimax methods in problems of recovering in nitedimensional objects (curves, densities, spectral densities, images) from noisy data. A rich and complex body of work has evolved, with nearly or exactly minimax estimators bein ..."
Abstract

Cited by 295 (36 self)
 Add to MetaCart
Considerable e ort has been directed recently to develop asymptotically minimax methods in problems of recovering in nitedimensional objects (curves, densities, spectral densities, images) from noisy data. A rich and complex body of work has evolved, with nearly or exactly minimax estimators being obtained for a variety of interesting problems. Unfortunately, the results have often not been translated into practice, for a variety of reasons { sometimes, similarity to known methods, sometimes, computational intractability, and sometimes, lack of spatial adaptivity. We discuss a method for curve estimation based on n noisy data; one translates the empirical wavelet coe cients towards the origin by an amount p p 2 log(n) = n. The method is di erent from methods in common use today, is computationally practical, and is spatially adaptive; thus it avoids a number of previous objections to minimax estimators. At the same time, the method is nearly minimax for a wide variety of loss functions { e.g. pointwise error, global error measured in L p norms, pointwise and global error in estimation of derivatives { and for a wide range of smoothness classes, including standard Holder classes, Sobolev classes, and Bounded Variation. This is amuch broader nearoptimality than anything previously proposed in the minimax literature. Finally, the theory underlying the method is interesting, as it exploits a correspondence between statistical questions and questions of optimal recovery and informationbased complexity.
Minimax bayes, asymptotic minimax and sparse wavelet priors, in
 Sciences Paris (A
, 1994
"... Pinsker(1980) gave a precise asymptotic evaluation of the minimax mean squared error of estimation of a signal in Gaussian noise when the signal is known a priori to lie in a compact ellipsoid in Hilbert space. This `Minimax Bayes ' method can be applied to a variety of global nonparametric es ..."
Abstract

Cited by 45 (9 self)
 Add to MetaCart
Pinsker(1980) gave a precise asymptotic evaluation of the minimax mean squared error of estimation of a signal in Gaussian noise when the signal is known a priori to lie in a compact ellipsoid in Hilbert space. This `Minimax Bayes ' method can be applied to a variety of global nonparametric estimation settings with parameter spaces far from ellipsoidal. For example it leads to a theory of exact asymptotic minimax estimation over norm balls in Besov and Triebel spaces using simple coordinatewise estimators and wavelet bases. This paper outlines some features of the method common to several applications. In particular, we derive new results on the exact asymptotic minimax risk over weak `p balls in Rn as n!1, and also for a class of `local ' estimators on the Triebel scale. By its very nature, the method reveals the structure of asymptotically least favorable distributions. Thus wemaysimulate `least favorable ' sample paths. We illustrate this for estimation of a signal in Gaussian white noise over norm balls in certain Besov spaces. In wavelet bases, when p<2, the least favorable priors are sparse, and the resulting sample paths strikingly di erent from those observed in Pinsker's ellipsoidal setting (p =2).
On the Estimation of Quadratic Functionals
"... We discuss the difficulties of estimating quadratic functionals based on observations Y (t) from the white noise model Y (t) = Jf (u)du + cr W (t), t E [0,1], o where W (t) is a standard Wiener process on [0, 1]. The optimal rates of convergence (as cr> 0) for estimating quadratic functionals u ..."
Abstract

Cited by 38 (10 self)
 Add to MetaCart
We discuss the difficulties of estimating quadratic functionals based on observations Y (t) from the white noise model Y (t) = Jf (u)du + cr W (t), t E [0,1], o where W (t) is a standard Wiener process on [0, 1]. The optimal rates of convergence (as cr> 0) for estimating quadratic functionals under certain geometric constraints are 1 found. Specially, the optimal rates of estimating J[f (k)(x)f dx under hyperrectangular o constraints r = (J: Xj(f)::; CFP) and weighted lpbody constraints r p = (J: "Lj ' IXj(f)IP::; C) are computed explicitly, where Xj(f) is the jth Fourier1 Bessel coefficient of the unknown function f. We invent a new method for developing lower bounds based on testing two highly composite hypercubes, and address its advantages. The attainable lower bounds are found by applying the hardest Idimensional approach as well as the hypercube method. We demonstrate that for estimating regular quadratic functionals (Le., the functionals which can be estimated at rate 0 (cr 2», the difficulties of the estimation are captured by the hardest one dimensional subproblems and for estimating nonregular quadratic functionals (i.e. no 0 (cr1consistent estimator exists), the difficulties are captured at certain finite dimensional (the dimension goes to infinite as cr> 0) hypercube subproblems.
Consistency of cross validation for comparing regression procedures. Annals of Statistics, Accepted paper
"... Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistenc ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
(Show Context)
Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.
Rethinking biased estimation: Improving maximum likelihood and the CramérRao bound
 TRENDS IN SIGNAL PROCESS
, 2007
"... One of the prime goals of statistical estimation theory is the development of performance bounds when estimating parameters of interest in a given model, as well as constructing estimators that achieve these limits. When the parameters to be estimated are deterministic, a popular approach is to boun ..."
Abstract

Cited by 21 (12 self)
 Add to MetaCart
(Show Context)
One of the prime goals of statistical estimation theory is the development of performance bounds when estimating parameters of interest in a given model, as well as constructing estimators that achieve these limits. When the parameters to be estimated are deterministic, a popular approach is to bound the meansquared error (MSE) achievable within the class of unbiased estimators. Although it is wellknown that lower MSE can be obtained by allowing for a bias, in applications it is typically unclear how to choose an appropriate bias. In this survey we introduce MSE bounds that are lower than the unbiased Cramér–Rao bound (CRB) for all values of the unknowns. We then present a general framework for constructing biased estimators with smaller MSE than the standard maximumlikelihood (ML) approach, regardless of the true unknown values. Specializing the results to the linear Gaussian model, we derive a class of estimators that dominate leastsquares in terms of MSE. We also introduce methods for choosing regularization parameters in penalized ML estimators that outperform standard techniques such as cross validation.
Optimal smoothing in nonparametric mixedeffect models
 Ann. Statist
, 2005
"... ar ..."
(Show Context)
Sharp analysis of lowrank kernel matrix approximations
 JMLR: WORKSHOP AND CONFERENCE PROCEEDINGS VOL 30 (2013) 1–25
, 2013
"... We consider supervised learning problems within the positivedefinite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinitedimensional feature spaces, a common practical limiting difficulty is the necessity of c ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We consider supervised learning problems within the positivedefinite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinitedimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n 2). Lowrank approximations of the kernel matrix are often considered as they allow the reduction of running time complexities to O(p 2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of nonparametric estimators. This result enables simple algorithms that have subquadratic running time complexity, but provably exhibit the same predictive performance than existing algorithms, for any given problem instance, and not only for worstcase situations.
Asymptotically Minimax Nonparametric Regression in L2
 in L 2 . Statistics
, 1996
"... Introduction In the nonparametric regression context, the notion of asymptotic optimality usually associates with the "optimal rate of convergence". Minimax rates of convergence have been extensively studied (Ibragimov and Hasminskii (1980), (1982); Stone (1980), (1982) and many others). ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Introduction In the nonparametric regression context, the notion of asymptotic optimality usually associates with the "optimal rate of convergence". Minimax rates of convergence have been extensively studied (Ibragimov and Hasminskii (1980), (1982); Stone (1980), (1982) and many others). Different estimators turn out to be optimal in the sense of the best rate of convergence. We mention only some of them: kernel estimators (Ibragimov and Hasminskii (1980), Korostelev (1993)), projection estimators (Ibragimov and Hasminskii (1981)), spline estimators (Speckman (1985), Nussbaum (1985)), wavelets (Donoho and Johnstone (1992)). From the practical point of view stochastic approximation estimators considered in Belitser and Korostelev (1992) are also of interest. However, comparing estimators on the basis of their rates of convergence does not make it possible to distinguish among estimators optimal in that sence. Also from a more practical point of view, such approach does not give