Results 1  10
of
297
ATOMIC DECOMPOSITION BY BASIS PURSUIT
, 1995
"... The TimeFrequency and TimeScale communities have recently developed a large number of overcomplete waveform dictionaries  stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for d ..."
Abstract

Cited by 2731 (61 self)
 Add to MetaCart
(Show Context)
The TimeFrequency and TimeScale communities have recently developed a large number of overcomplete waveform dictionaries  stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the Method of Frames (MOF), Matching Pursuit (MP), and, for special dictionaries, the Best Orthogonal Basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l 1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP and BOB, including better sparsity, and superresolution. BP has interesting relations to ideas in areas as diverse as illposed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. Basis Pursuit in highly overcomplete dictionaries leads to largescale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interiorpoint methods. We obtain reasonable success with a primaldual logarithmic barrier method and conjugategradient solver.
Adapting to unknown smoothness via wavelet shrinkage
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1995
"... We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the princip ..."
Abstract

Cited by 990 (20 self)
 Add to MetaCart
We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the principle of minimizing the Stein Unbiased Estimate of Risk (Sure) for threshold estimates. The computational effort of the overall procedure is order N log(N) as a function of the sample size N. SureShrink is smoothnessadaptive: if the unknown function contains jumps, the reconstruction (essentially) does also; if the unknown function has a smooth piece, the reconstruction is (essentially) as smooth as the mother wavelet will allow. The procedure is in a sense optimally smoothnessadaptive: it is nearminimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet. We know from a previous paper by the authors that traditional smoothing methods  kernels, splines, and orthogonal series estimates  even with optimal choices of the smoothing parameter, would be unable to perform
Independent component analysis: algorithms and applications
 NEURAL NETWORKS
, 2000
"... ..."
(Show Context)
From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
, 2007
"... A fullrank matrix A ∈ IR n×m with n < m generates an underdetermined system of linear equations Ax = b having infinitely many solutions. Suppose we seek the sparsest solution, i.e., the one with the fewest nonzero entries: can it ever be unique? If so, when? As optimization of sparsity is combin ..."
Abstract

Cited by 423 (37 self)
 Add to MetaCart
A fullrank matrix A ∈ IR n×m with n < m generates an underdetermined system of linear equations Ax = b having infinitely many solutions. Suppose we seek the sparsest solution, i.e., the one with the fewest nonzero entries: can it ever be unique? If so, when? As optimization of sparsity is combinatorial in nature, are there efficient methods for finding the sparsest solution? These questions have been answered positively and constructively in recent years, exposing a wide variety of surprising phenomena; in particular, the existence of easilyverifiable conditions under which optimallysparse solutions can be found by concrete, effective computational methods. Such theoretical results inspire a bold perspective on some important practical problems in signal and image processing. Several wellknown signal and image processing problems can be cast as demanding solutions of undetermined systems of equations. Such problems have previously seemed, to many, intractable. There is considerable evidence that these problems often have sparse solutions. Hence, advances in finding sparse solutions to underdetermined systems energizes research on such signal and image processing problems – to striking effect. In this paper we review the theoretical results on sparse solutions of linear systems, empirical
Translationinvariant denoising
, 1995
"... DeNoising with the traditional (orthogonal, maximallydecimated) wavelet transform sometimes exhibits visual artifacts; we attribute some of these – for example, Gibbs phenomena in the neighborhood of discontinuities – to the lack of translation invariance of the wavelet basis. One method to suppre ..."
Abstract

Cited by 307 (8 self)
 Add to MetaCart
(Show Context)
DeNoising with the traditional (orthogonal, maximallydecimated) wavelet transform sometimes exhibits visual artifacts; we attribute some of these – for example, Gibbs phenomena in the neighborhood of discontinuities – to the lack of translation invariance of the wavelet basis. One method to suppress such artifacts, termed “cycle spinning ” by Coifman, is to “average out ” the translation dependence. For a range of shifts, one shifts the data (right or left as the case may be), DeNoises the shifted data, and then unshifts the denoised data. Doing this for each of a range of shifts, and averaging the several results so obtained, produces a reconstruction subject to far weaker Gibbs phenomena than thresholding based DeNoising using the traditional orthogonal wavelet transform. CycleSpinning over the range of all circulant shifts can be accomplished in order nlog 2(n) time; it is equivalent to denoising using the undecimated or stationary wavelet transform. Cyclespinning exhibits benefits outside of wavelet denoising, for example in cosine packet denoising, where it helps suppress ‘clicks’. It also has a counterpart in frequency domain denoising, where the goal of translationinvariance is replaced by modulation invariance, and the central shiftDeNoiseunshift operation is replaced by modulateDeNoisedemodulate. We illustrate these concepts with extensive computational examples; all figures presented here are reproducible using the WaveLab software package. 1
An interiorpoint method for largescale l1regularized logistic regression
 Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract

Cited by 284 (8 self)
 Add to MetaCart
(Show Context)
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warmstart techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
Adapting to unknown sparsity by controlling the false discovery rate
, 2000
"... We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the order ..."
Abstract

Cited by 182 (23 self)
 Add to MetaCart
(Show Context)
We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the ordered entries; and controlling the ℓp norm for p small. We obtain a procedure which is asymptotically minimax for ℓr loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a dataadaptive thresholding scheme, driven by control of the False Discovery Rate (FDR). FDR control is a recent innovation in simultaneous testing, in which one seeks to ensure that at most a certain fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter q also plays a controlling role in asymptotic minimaxity. Our results say that letting q = qn → 0 with problem size n is sufficient for asymptotic minimaxity, while keeping fixed q>1/2prevents asymptotic minimaxity. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2·log ( potential model size / actual model size). We exhibit a close connection with FDRcontrolling procedures having q tending to 0; this connection strongly supports a conjecture of simultaneous asymptotic minimaxity for such model selection rules.
Large Sample Sieve Estimation of SemiNonparametric Models
 Handbook of Econometrics
, 2007
"... Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; seminonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method o ..."
Abstract

Cited by 181 (19 self)
 Add to MetaCart
Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; seminonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method of sieves provides one way to tackle such complexities by optimizing an empirical criterion function over a sequence of approximating parameter spaces, called sieves, which are significantly less complex than the original parameter space. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated econometric models. For example, it can simultaneously estimate the parametric and nonparametric components in seminonparametric models with or without constraints. It can easily incorporate prior information, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity. This chapter describes estimation of seminonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve Mestimates, pointwise normality of series estimates of regression functions, rootn asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite dimensional parameters. Examples are used to illustrate the general results.
Adaptive wavelet estimation: A block thresholding and oracle inequality approach
 Ann. Statist
, 1999
"... We study wavelet function estimation via the approach of block thresholding and ideal adaptation with oracle. Oracle inequalities are derived and serve as guides for the selection of smoothing parameters. Based on an oracle inequality and motivated by the data compression and localization properties ..."
Abstract

Cited by 146 (20 self)
 Add to MetaCart
We study wavelet function estimation via the approach of block thresholding and ideal adaptation with oracle. Oracle inequalities are derived and serve as guides for the selection of smoothing parameters. Based on an oracle inequality and motivated by the data compression and localization properties of wavelets, an adaptive wavelet estimator for nonparametric regression is proposed and the optimality of the procedure is investigated. We show that the estimator achieves simultaneously three objectives: adaptivity, spatial adaptivity and computational efficiency. Specifically, it is proved that the estimator attains the exact optimal rates of convergence over a range of Besov classes and the estimator achieves adaptive local minimax rate for estimating functions at a point. The estimator is easy to implement, at the computational cost of O�n�. Simulation shows that the estimator has excellent numerical performance relative to more traditional wavelet estimators. 1. Introduction. Wavelet