Results 1 - 10
of
295
ATOMIC DECOMPOSITION BY BASIS PURSUIT
, 1995
"... The Time-Frequency and Time-Scale communities have recently developed a large number of overcomplete waveform dictionaries -- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for d ..."
Abstract
-
Cited by 2728 (61 self)
- Add to MetaCart
(Show Context)
The Time-Frequency and Time-Scale communities have recently developed a large number of overcomplete waveform dictionaries -- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the Method of Frames (MOF), Matching Pursuit (MP), and, for special dictionaries, the Best Orthogonal Basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l 1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP and BOB, including better sparsity, and super-resolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation de-noising, and multi-scale edge denoising. Basis Pursuit in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.
Adapting to unknown smoothness via wavelet shrinkage
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1995
"... We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the princip ..."
Abstract
-
Cited by 1006 (18 self)
- Add to MetaCart
We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the principle of minimizing the Stein Unbiased Estimate of Risk (Sure) for threshold estimates. The computational effort of the overall procedure is order N log(N) as a function of the sample size N. SureShrink is smoothness-adaptive: if the unknown function contains jumps, the reconstruction (essentially) does also; if the unknown function has a smooth piece, the reconstruction is (essentially) as smooth as the mother wavelet will allow. The procedure is in a sense optimally smoothness-adaptive: it is near-minimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet. We know from a previous paper by the authors that traditional smoothing methods -- kernels, splines, and orthogonal series estimates -- even with optimal choices of the smoothing parameter, would be unable to perform
Independent component analysis: algorithms and applications
- NEURAL NETWORKS
, 2000
"... ..."
(Show Context)
From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
, 2007
"... A full-rank matrix A ∈ IR n×m with n < m generates an underdetermined system of linear equations Ax = b having infinitely many solutions. Suppose we seek the sparsest solution, i.e., the one with the fewest nonzero entries: can it ever be unique? If so, when? As optimization of sparsity is combin ..."
Abstract
-
Cited by 427 (36 self)
- Add to MetaCart
A full-rank matrix A ∈ IR n×m with n < m generates an underdetermined system of linear equations Ax = b having infinitely many solutions. Suppose we seek the sparsest solution, i.e., the one with the fewest nonzero entries: can it ever be unique? If so, when? As optimization of sparsity is combinatorial in nature, are there efficient methods for finding the sparsest solution? These questions have been answered positively and constructively in recent years, exposing a wide variety of surprising phenomena; in particular, the existence of easily-verifiable conditions under which optimally-sparse solutions can be found by concrete, effective computational methods. Such theoretical results inspire a bold perspective on some important practical problems in signal and image processing. Several well-known signal and image processing problems can be cast as demanding solutions of undetermined systems of equations. Such problems have previously seemed, to many, intractable. There is considerable evidence that these problems often have sparse solutions. Hence, advances in finding sparse solutions to underdetermined systems energizes research on such signal and image processing problems – to striking effect. In this paper we review the theoretical results on sparse solutions of linear systems, empirical
Translation-invariant de-noising
, 1995
"... De-Noising with the traditional (orthogonal, maximally-decimated) wavelet transform sometimes exhibits visual artifacts; we attribute some of these – for example, Gibbs phenomena in the neighborhood of discontinuities – to the lack of translation invariance of the wavelet basis. One method to suppre ..."
Abstract
-
Cited by 307 (7 self)
- Add to MetaCart
(Show Context)
De-Noising with the traditional (orthogonal, maximally-decimated) wavelet transform sometimes exhibits visual artifacts; we attribute some of these – for example, Gibbs phenomena in the neighborhood of discontinuities – to the lack of translation invariance of the wavelet basis. One method to suppress such artifacts, termed “cycle spinning ” by Coifman, is to “average out ” the translation dependence. For a range of shifts, one shifts the data (right or left as the case may be), De-Noises the shifted data, and then unshifts the de-noised data. Doing this for each of a range of shifts, and averaging the several results so obtained, produces a reconstruction subject to far weaker Gibbs phenomena than thresholding based De-Noising using the traditional orthogonal wavelet transform. Cycle-Spinning over the range of all circulant shifts can be accomplished in order nlog 2(n) time; it is equivalent to de-noising using the undecimated or stationary wavelet transform. Cycle-spinning exhibits benefits outside of wavelet de-noising, for example in cosine packet denoising, where it helps suppress ‘clicks’. It also has a counterpart in frequency domain de-noising, where the goal of translation-invariance is replaced by modulation invariance, and the central shift-De-Noise-unshift operation is replaced by modulate-De-Noise-demodulate. We illustrate these concepts with extensive computational examples; all figures presented here are reproducible using the WaveLab software package. 1
An interior-point method for large-scale l1-regularized logistic regression
- Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract
-
Cited by 290 (9 self)
- Add to MetaCart
(Show Context)
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warm-start techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
Large Sample Sieve Estimation of Semi-Nonparametric Models
- Handbook of Econometrics
, 2007
"... Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method o ..."
Abstract
-
Cited by 185 (19 self)
- Add to MetaCart
Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method of sieves provides one way to tackle such complexities by optimizing an empirical criterion function over a sequence of approximating parameter spaces, called sieves, which are significantly less complex than the original parameter space. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated econometric models. For example, it can simultaneously estimate the parametric and nonparametric components in semi-nonparametric models with or without constraints. It can easily incorporate prior information, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and non-negativity. This chapter describes estimation of semi-nonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve M-estimates, pointwise normality of series estimates of regression functions, root-n asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite dimensional parameters. Examples are used to illustrate the general results.
Adapting to unknown sparsity by controlling the false discovery rate
, 2000
"... We attempt to recover a high-dimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the order ..."
Abstract
-
Cited by 183 (23 self)
- Add to MetaCart
(Show Context)
We attempt to recover a high-dimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the ordered entries; and controlling the ℓp norm for p small. We obtain a procedure which is asymptotically minimax for ℓr loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a data-adaptive thresholding scheme, driven by control of the False Discovery Rate (FDR). FDR control is a recent innovation in simultaneous testing, in which one seeks to ensure that at most a certain fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter q also plays a controlling role in asymptotic minimaxity. Our results say that letting q = qn → 0 with problem size n is sufficient for asymptotic minimaxity, while keeping fixed q>1/2prevents asymptotic minimaxity. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2·log ( potential model size / actual model size). We exhibit a close connection with FDR-controlling procedures having q tending to 0; this connection strongly supports a conjecture of simultaneous asymptotic minimaxity for such model selection rules.
Basis pursuit.
- In IEEE the Twenty-Eighth Asilomar Conference onSignals, Systems and Computers,
, 1994
"... ..."
Adaptive wavelet estimation: A block thresholding and oracle inequality approach
- Ann. Statist
, 1999
"... We study wavelet function estimation via the approach of block thresholding and ideal adaptation with oracle. Oracle inequalities are derived and serve as guides for the selection of smoothing parameters. Based on an oracle inequality and motivated by the data compression and localization properties ..."
Abstract
-
Cited by 147 (20 self)
- Add to MetaCart
We study wavelet function estimation via the approach of block thresholding and ideal adaptation with oracle. Oracle inequalities are derived and serve as guides for the selection of smoothing parameters. Based on an oracle inequality and motivated by the data compression and localization properties of wavelets, an adaptive wavelet estimator for nonparametric regression is proposed and the optimality of the procedure is investigated. We show that the estimator achieves simultaneously three objectives: adaptivity, spatial adaptivity and computational efficiency. Specifically, it is proved that the estimator attains the exact optimal rates of convergence over a range of Besov classes and the estimator achieves adaptive local minimax rate for estimating functions at a point. The estimator is easy to implement, at the computational cost of O�n�. Simulation shows that the estimator has excellent numerical performance relative to more traditional wavelet estimators. 1. Introduction. Wavelet