Results 1  10
of
48
Ideal spatial adaptation by wavelet shrinkage
 Biometrika
, 1994
"... With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline, or variable bandwidth kernel, to the unknown function. Estimation with the aid of an oracle o ers dramatic ad ..."
Abstract

Cited by 1251 (5 self)
 Add to MetaCart
With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline, or variable bandwidth kernel, to the unknown function. Estimation with the aid of an oracle o ers dramatic advantages over traditional linear estimation by nonadaptive kernels � however, it is a priori unclear whether such performance can be obtained by a procedure relying on the data alone. We describe a new principle for spatiallyadaptive estimation: selective wavelet reconstruction. Weshowthatvariableknot spline ts and piecewisepolynomial ts, when equipped with an oracle to select the knots, are not dramatically more powerful than selective wavelet reconstruction with an oracle. We develop a practical spatially adaptive method, RiskShrink, which works by shrinkage of empirical wavelet coe cients. RiskShrink mimics the performance of an oracle for selective wavelet reconstruction as well as it is possible to do so. A new inequality inmultivariate normal decision theory which wecallthe oracle inequality shows that attained performance di ers from ideal performance by at most a factor 2logn, where n is the sample size. Moreover no estimator can give a better guarantee than this. Within the class of spatially adaptive procedures, RiskShrink is essentially optimal. Relying only on the data, it comes within a factor log 2 n of the performance of piecewise polynomial and variableknot spline methods equipped with an oracle. In contrast, it is unknown how or if piecewise polynomial methods could be made to function this well when denied access to an oracle and forced to rely on data alone.
DeNoising By SoftThresholding
, 1992
"... Donoho and Johnstone (1992a) proposed a method for reconstructing an unknown function f on [0; 1] from noisy data di = f(ti)+ zi, iid i =0;:::;n 1, ti = i=n, zi N(0; 1). The reconstruction fn ^ is de ned in the wavelet domain by translating all the empirical wavelet coe cients of d towards 0 by an a ..."
Abstract

Cited by 1249 (14 self)
 Add to MetaCart
Donoho and Johnstone (1992a) proposed a method for reconstructing an unknown function f on [0; 1] from noisy data di = f(ti)+ zi, iid i =0;:::;n 1, ti = i=n, zi N(0; 1). The reconstruction fn ^ is de ned in the wavelet domain by translating all the empirical wavelet coe cients of d towards 0 by an amount p 2 log(n) = p n. We prove two results about that estimator. [Smooth]: With high probability ^ fn is at least as smooth as f, in any of a wide variety of smoothness measures. [Adapt]: The estimator comes nearly as close in mean square to f as any measurable estimator can come, uniformly over balls in each of two broad scales of smoothness classes. These two properties are unprecedented in several ways. Our proof of these results develops new facts about abstract statistical inference and its connection with an optimal recovery model.
Adapting to unknown smoothness via wavelet shrinkage
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1995
"... We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the princip ..."
Abstract

Cited by 990 (20 self)
 Add to MetaCart
We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the principle of minimizing the Stein Unbiased Estimate of Risk (Sure) for threshold estimates. The computational effort of the overall procedure is order N log(N) as a function of the sample size N. SureShrink is smoothnessadaptive: if the unknown function contains jumps, the reconstruction (essentially) does also; if the unknown function has a smooth piece, the reconstruction is (essentially) as smooth as the mother wavelet will allow. The procedure is in a sense optimally smoothnessadaptive: it is nearminimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet. We know from a previous paper by the authors that traditional smoothing methods  kernels, splines, and orthogonal series estimates  even with optimal choices of the smoothing parameter, would be unable to perform
Wavelet shrinkage: asymptopia
 Journal of the Royal Statistical Society, Ser. B
, 1995
"... Considerable e ort has been directed recently to develop asymptotically minimax methods in problems of recovering in nitedimensional objects (curves, densities, spectral densities, images) from noisy data. A rich and complex body of work has evolved, with nearly or exactly minimax estimators bein ..."
Abstract

Cited by 297 (36 self)
 Add to MetaCart
Considerable e ort has been directed recently to develop asymptotically minimax methods in problems of recovering in nitedimensional objects (curves, densities, spectral densities, images) from noisy data. A rich and complex body of work has evolved, with nearly or exactly minimax estimators being obtained for a variety of interesting problems. Unfortunately, the results have often not been translated into practice, for a variety of reasons { sometimes, similarity to known methods, sometimes, computational intractability, and sometimes, lack of spatial adaptivity. We discuss a method for curve estimation based on n noisy data; one translates the empirical wavelet coe cients towards the origin by an amount p p 2 log(n) = n. The method is di erent from methods in common use today, is computationally practical, and is spatially adaptive; thus it avoids a number of previous objections to minimax estimators. At the same time, the method is nearly minimax for a wide variety of loss functions { e.g. pointwise error, global error measured in L p norms, pointwise and global error in estimation of derivatives { and for a wide range of smoothness classes, including standard Holder classes, Sobolev classes, and Bounded Variation. This is amuch broader nearoptimality than anything previously proposed in the minimax literature. Finally, the theory underlying the method is interesting, as it exploits a correspondence between statistical questions and questions of optimal recovery and informationbased complexity.
Adaptive hypothesis testing using wavelets
 Annals of Statistics
, 1996
"... Let a function f be observed with a noise. We wish to test the null hypothesis that the function is identically zero, against a composite nonparametric alternative: functions from the alternative set are separated away from zero in an integral Ž e.g., L. 2 norm and also possess some smoothness prope ..."
Abstract

Cited by 91 (10 self)
 Add to MetaCart
Let a function f be observed with a noise. We wish to test the null hypothesis that the function is identically zero, against a composite nonparametric alternative: functions from the alternative set are separated away from zero in an integral Ž e.g., L. 2 norm and also possess some smoothness properties. The minimax rate of testing for this problem was evaluated in earlier papers by Ingster and by Lepski and Spokoiny under different kinds of smoothness assumptions. It was shown that both the optimal rate of testing and the structure of optimal Ž in rate. tests depend on smoothness parameters which are usually unknown in practical applications. In this paper the problem of adaptive Ž assumption free. testing is considered. It is shown that adaptive testing without loss of efficiency is impossible. An extra log logfactor is inessential but unavoidable payment for the adaptation. A simple adaptive test based on wavelet technique is constructed which is nearly minimax for a wide range of Besov classes. 1. Introduction. Suppose
Sharp Adaptation for Inverse Problems With Random Noise
, 2000
"... We consider a heteroscedastic sequence space setup with polynomially increasing variances of observations that allows to treat a number of inverse problems, in particular multivariate ones. We propose an adaptive estimator that attains simultaneously exact asymptotic minimax constants on every ellip ..."
Abstract

Cited by 84 (8 self)
 Add to MetaCart
We consider a heteroscedastic sequence space setup with polynomially increasing variances of observations that allows to treat a number of inverse problems, in particular multivariate ones. We propose an adaptive estimator that attains simultaneously exact asymptotic minimax constants on every ellipsoid of functions within a wide scale (that includes ellipoids with polynomially and exponentially decreasing axes) and, at the same time, satisfies asymptotically exact oracle inequalities within any class of linear estimates having monotone nondecreasing weights. As application, we construct sharp adaptive estimators in the problems of deconvolution and tomography.
On spatial adaptive estimation of nonparametric regression
 Math. Meth. Statistics
, 1997
"... The paper is devoted to developing spatial adaptive estimates for restoring functions from noisy observations. We show that the traditional least square (piecewise polynomial) estimate equipped with adaptively adjusted window possesses simultaneously many attractive adaptive properties, namely, 1) i ..."
Abstract

Cited by 83 (4 self)
 Add to MetaCart
(Show Context)
The paper is devoted to developing spatial adaptive estimates for restoring functions from noisy observations. We show that the traditional least square (piecewise polynomial) estimate equipped with adaptively adjusted window possesses simultaneously many attractive adaptive properties, namely, 1) it is near– optimal within ln n–factor for estimating a function (or its derivative) at a single point; 2) it is spatial adaptive in the sense that its quality is close to that one which could be achieved if smoothness of the underlying function was known in advance; 3) it is optimal in order (in the case of “strong ” accuracy measure) or near–optimal within ln n–factor (in the case of “weak ” accuracy measure) for estimating whole function (or its derivative) over wide range of the classes and global loss functions. We demonstrate that the “spatial adaptive abilities ” of our estimate are, in a sense, the best possible. Besides this, our adaptive estimate is computationally efficient and demonstrates reasonable practical behavior. 1
A.: Functional aggregation for nonparametric regression
 Ann. Stat
, 2000
"... We consider the problem of estimating an unknown function f from N noisy observations on a random grid. In this paper we address the following aggregation problem: given M functions f 1�����f M find an “aggregated” estimator which approximates f nearly as well as the best convex combination f ∗ of f ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of estimating an unknown function f from N noisy observations on a random grid. In this paper we address the following aggregation problem: given M functions f 1�����f M find an “aggregated” estimator which approximates f nearly as well as the best convex combination f ∗ of f 1�����f M. We propose algorithms which provide approximations of f ∗ with expected L 2 accuracy O�N −1/4 ln 1/4 M�. We show that this approximation rate cannot be significantly improved. We discuss two specific applications: nonparametric prediction for a dynamic system with output nonlinearity and reconstruction in the Jones– Barron class. 1. Introduction. Consider
Nonlinear BlackBox Models in System Identification: Mathematical Foundations
, 1995
"... In this paper we discuss several aspects of the mathematical foundations of nonlinear blackbox identification problem. As we shall see that the quality of the identification procedure is always a result of a certain tradeoff between the expressive power of the model we try to identify (the larger ..."
Abstract

Cited by 44 (6 self)
 Add to MetaCart
In this paper we discuss several aspects of the mathematical foundations of nonlinear blackbox identification problem. As we shall see that the quality of the identification procedure is always a result of a certain tradeoff between the expressive power of the model we try to identify (the larger is the number of parameters used to describe the model, more flexible would be the approximation), and the stochastic error (which is proportional to the number of parameters). A consequence of this tradeoff is a simple fact that good approximation technique can be a basis of good identification algorithm. From this point of view we consider different approximation methods, and pay special attention to spatially adaptive approximants. We introduce wavelet and "neuron" approximations and show that they are spatially adaptive. Then we apply the acquired approximation experience to estimation problems. Finally, we consider some implications of these theoretic developments for the practically...
Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth. The Annals of Statistics 37
, 2009
"... We consider nonparametric Bayesian estimation inference using a rescaled smooth Gaussian field as a prior for a multidimensional function. The rescaling is achieved using a Gamma variable and the procedure can be viewed as choosing an inverse Gamma bandwidth. The procedure is studied from a frequent ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
We consider nonparametric Bayesian estimation inference using a rescaled smooth Gaussian field as a prior for a multidimensional function. The rescaling is achieved using a Gamma variable and the procedure can be viewed as choosing an inverse Gamma bandwidth. The procedure is studied from a frequentist perspective in three statistical settings involving replicated observations (density estimation, regression and classification). We prove that the resulting posterior distribution shrinks to the distribution that generates the data at a speed which is minimaxoptimal up to a logarithmic factor, whatever the regularity level of the datagenerating distribution. Thus the hierachical Bayesian procedure, with a fixed prior, is shown to be fully adaptive. 1. Introduction. The