Results 1  10
of
1,239
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
, 2001
"... Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized ..."
Abstract

Cited by 914 (61 self)
 Add to MetaCart
Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of parametric models such as generalized linear models and robust regression models. They can also be applied easily to nonparametric modeling by using wavelets and splines. Rates of convergence of the proposed penalized likelihood estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known. Our simulation shows that the newly proposed methods compare favorably with other variable selection techniques. Furthermore, the standard error formulas are tested to be accurate enough for practical applications.
A Practical Guide to Wavelet Analysis
, 1998
"... A practical stepbystep guide to wavelet analysis is given, with examples taken from time series of the El Nio Southern Oscillation (ENSO). The guide includes a comparison to the windowed Fourier transform, the choice of an appropriate wavelet basis function, edge effects due to finitelength t ..."
Abstract

Cited by 833 (3 self)
 Add to MetaCart
A practical stepbystep guide to wavelet analysis is given, with examples taken from time series of the El Nio Southern Oscillation (ENSO). The guide includes a comparison to the windowed Fourier transform, the choice of an appropriate wavelet basis function, edge effects due to finitelength time series, and the relationship between wavelet scale and Fourier frequency. New statistical significance tests for wavelet power spectra are developed by deriving theoretical wavelet spectra for white and red noise processes and using these to establish significance levels and confidence intervals. It is shown that smoothing in time or scale can be used to increase the confidence of the wavelet spectrum. Empirical formulas are given for the effect of smoothing on significance levels and confidence intervals. Extensions to wavelet analysis such as filtering, the power Hovmller, crosswavelet spectra, and coherence are described. The statistical significance tests are used to give a qu...
Regularization paths for generalized linear models via coordinate descent
, 2009
"... We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic ..."
Abstract

Cited by 698 (14 self)
 Add to MetaCart
(Show Context)
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
Image denoising using a scale mixture of Gaussians in the wavelet domain
 IEEE TRANS IMAGE PROCESSING
, 2003
"... We describe a method for removing noise from digital images, based on a statistical model of the coefficients of an overcomplete multiscale oriented basis. Neighborhoods of coefficients at adjacent positions and scales are modeled as the product of two independent random variables: a Gaussian vecto ..."
Abstract

Cited by 514 (17 self)
 Add to MetaCart
(Show Context)
We describe a method for removing noise from digital images, based on a statistical model of the coefficients of an overcomplete multiscale oriented basis. Neighborhoods of coefficients at adjacent positions and scales are modeled as the product of two independent random variables: a Gaussian vector and a hidden positive scalar multiplier. The latter modulates the local variance of the coefficients in the neighborhood, and is thus able to account for the empirically observed correlation between the coefficient amplitudes. Under this model, the Bayesian least squares estimate of each coefficient reduces to a weighted average of the local linear estimates over all possible values of the hidden multiplier variable. We demonstrate through simulations with images contaminated by additive white Gaussian noise that the performance of this method substantially surpasses that of previously published methods, both visually and in terms of mean squared error.
From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
, 2007
"... A fullrank matrix A ∈ IR n×m with n < m generates an underdetermined system of linear equations Ax = b having infinitely many solutions. Suppose we seek the sparsest solution, i.e., the one with the fewest nonzero entries: can it ever be unique? If so, when? As optimization of sparsity is combin ..."
Abstract

Cited by 423 (37 self)
 Add to MetaCart
(Show Context)
A fullrank matrix A ∈ IR n×m with n < m generates an underdetermined system of linear equations Ax = b having infinitely many solutions. Suppose we seek the sparsest solution, i.e., the one with the fewest nonzero entries: can it ever be unique? If so, when? As optimization of sparsity is combinatorial in nature, are there efficient methods for finding the sparsest solution? These questions have been answered positively and constructively in recent years, exposing a wide variety of surprising phenomena; in particular, the existence of easilyverifiable conditions under which optimallysparse solutions can be found by concrete, effective computational methods. Such theoretical results inspire a bold perspective on some important practical problems in signal and image processing. Several wellknown signal and image processing problems can be cast as demanding solutions of undetermined systems of equations. Such problems have previously seemed, to many, intractable. There is considerable evidence that these problems often have sparse solutions. Hence, advances in finding sparse solutions to underdetermined systems energizes research on such signal and image processing problems – to striking effect. In this paper we review the theoretical results on sparse solutions of linear systems, empirical
Sure independence screening for ultrahigh dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract

Cited by 279 (27 self)
 Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac
Wavelet Thresholding via a Bayesian Approach
 J. R. STATIST. SOC. B
, 1996
"... We discuss a Bayesian formalism which gives rise to a type of wavelet threshold estimation in nonparametric regression. A prior distribution is imposed on the wavelet coefficients of the unknown response function, designed to capture the sparseness of wavelet expansion common to most applications. ..."
Abstract

Cited by 259 (32 self)
 Add to MetaCart
We discuss a Bayesian formalism which gives rise to a type of wavelet threshold estimation in nonparametric regression. A prior distribution is imposed on the wavelet coefficients of the unknown response function, designed to capture the sparseness of wavelet expansion common to most applications. For the prior specified, the posterior median yields a thresholding procedure. Our prior model for the underlying function can be adjusted to give functions falling in any specific Besov space. We establish a relation between the hyperparameters of the prior model and the parameters of those Besov spaces within which realizations from the prior will fall. Such a relation gives insight into the meaning of the Besov space parameters. Moreover, the established relation makes it possible in principle to incorporate prior knowledge about the function's regularity properties into the prior model for its wavelet coefficients. However, prior knowledge about a function's regularity properties might b...
A Sparse Signal Reconstruction Perspective for Source Localization With Sensor Arrays
, 2005
"... We present a source localization method based on a sparse representation of sensor measurements with an overcomplete basis composed of samples from the array manifold. We enforce sparsity by imposing penalties based on the 1norm. A number of recent theoretical results on sparsifying properties of ..."
Abstract

Cited by 222 (6 self)
 Add to MetaCart
We present a source localization method based on a sparse representation of sensor measurements with an overcomplete basis composed of samples from the array manifold. We enforce sparsity by imposing penalties based on the 1norm. A number of recent theoretical results on sparsifying properties of 1 penalties justify this choice. Explicitly enforcing the sparsity of the representation is motivated by a desire to obtain a sharp estimate of the spatial spectrum that exhibits superresolution. We propose to use the singular value decomposition (SVD) of the data matrix to summarize multiple time or frequency samples. Our formulation leads to an optimization problem, which we solve efficiently in a secondorder cone (SOC) programming framework by an interior point implementation. We propose a grid refinement method to mitigate the effects of limiting estimates to a grid of spatial locations and introduce an automatic selection criterion for the regularization parameter involved in our approach. We demonstrate the effectiveness of the method on simulated data by plots of spatial spectra and by comparing the estimator variance to the Cramér–Rao bound (CRB). We observe that our approach has a number of advantages over other source localization techniques, including increased resolution, improved robustness to noise, limitations in data quantity, and correlation of the sources, as well as not requiring an accurate initialization.
Bivariate Shrinkage Functions for WaveletBased Denoising Exploiting Interscale Dependency
, 2002
"... Most simple nonlinear thresholding rules for waveletbased denoising assume that the wavelet coefficients are independent. However, wavelet coefficients of natural images have significant dependencies. In this paper, we will only consider the dependencies between the coefficients and their parents i ..."
Abstract

Cited by 202 (8 self)
 Add to MetaCart
Most simple nonlinear thresholding rules for waveletbased denoising assume that the wavelet coefficients are independent. However, wavelet coefficients of natural images have significant dependencies. In this paper, we will only consider the dependencies between the coefficients and their parents in detail. For this purpose, new nonGaussian bivariate distributions are proposed, and corresponding nonlinear threshold functions (shrinkage functions) are derived from the models using Bayesian estimation theory. The new shrinkage functions do not assume the independence of wavelet coefficients. We will show three image denoising examples in order to show the performance of these new bivariate shrinkage rules. In the second example, a simple subbanddependent datadriven image denoising system is described and compared with effective datadriven techniques in the literature, namely VisuShrink, SureShrink, BayesShrink, and hidden Markov models. In the third example, the same idea is applied to the dualtree complex wavelet coefficients.
Calibration and Empirical Bayes Variable Selection
 Biometrika
, 1997
"... this paper, is that with F =2logp. This choice was proposed by Foster &G eorge (1994) where it was called the Risk Inflation Criterion (RIC) because it asymptotically minimises the maximum predictive risk inflation due to selection when X is orthogonal. This choice and its minimax property were ..."
Abstract

Cited by 190 (20 self)
 Add to MetaCart
this paper, is that with F =2logp. This choice was proposed by Foster &G eorge (1994) where it was called the Risk Inflation Criterion (RIC) because it asymptotically minimises the maximum predictive risk inflation due to selection when X is orthogonal. This choice and its minimax property were also discovered independently by Donoho & Johnstone (1994) in the wavelet regression context, where they refer to it as the universal hard thresholding rule