Results 1  10
of
154
The adaptive LASSO and its oracle properties
 Journal of the American Statistical Association
"... The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain sc ..."
Abstract

Cited by 660 (10 self)
 Add to MetaCart
(Show Context)
The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain scenarios where the lasso is inconsistent for variable selection. We then propose a new version of the lasso, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the!1 penalty. We show that the adaptive lasso enjoys the oracle properties; namely, it performs as well as if the true underlying model were given in advance. Similar to the lasso, the adaptive lasso is shown to be nearminimax optimal. Furthermore, the adaptive lasso can be solved by the same efficient algorithm for solving the lasso. We also discuss the extension of the adaptive lasso in generalized linear models and show that the oracle properties still hold under mild regularity conditions. As a byproduct of our theory, the nonnegative garotte is shown to be consistent for variable selection.
From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
, 2007
"... A fullrank matrix A ∈ IR n×m with n < m generates an underdetermined system of linear equations Ax = b having infinitely many solutions. Suppose we seek the sparsest solution, i.e., the one with the fewest nonzero entries: can it ever be unique? If so, when? As optimization of sparsity is combin ..."
Abstract

Cited by 423 (37 self)
 Add to MetaCart
A fullrank matrix A ∈ IR n×m with n < m generates an underdetermined system of linear equations Ax = b having infinitely many solutions. Suppose we seek the sparsest solution, i.e., the one with the fewest nonzero entries: can it ever be unique? If so, when? As optimization of sparsity is combinatorial in nature, are there efficient methods for finding the sparsest solution? These questions have been answered positively and constructively in recent years, exposing a wide variety of surprising phenomena; in particular, the existence of easilyverifiable conditions under which optimallysparse solutions can be found by concrete, effective computational methods. Such theoretical results inspire a bold perspective on some important practical problems in signal and image processing. Several wellknown signal and image processing problems can be cast as demanding solutions of undetermined systems of equations. Such problems have previously seemed, to many, intractable. There is considerable evidence that these problems often have sparse solutions. Hence, advances in finding sparse solutions to underdetermined systems energizes research on such signal and image processing problems – to striking effect. In this paper we review the theoretical results on sparse solutions of linear systems, empirical
The bayesian lasso
, 2005
"... The Lasso estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters have independent Laplace (doubleexponential) priors. Gibbs sampling from this posterior is possible using an expanded hierarchy with conjugate normal priors ..."
Abstract

Cited by 277 (0 self)
 Add to MetaCart
The Lasso estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters have independent Laplace (doubleexponential) priors. Gibbs sampling from this posterior is possible using an expanded hierarchy with conjugate normal priors for the regression parameters and independent exponential priors on their variances. A connection with the inverse Gaussian distribution provides tractable full conditional distributions. The Bayesian Lasso provides interval estimates (Bayesian credible intervals) that can guide variable selection. Moreover, the structure of the hierarchical model provides both Bayesian and likelihood methods for selecting the Lasso parameter. Slight modifications lead to Bayesian versions of other Lassorelated estimation methods like bridge regression and a robust variant.
Sparse Principal Component Analysis
 Journal of Computational and Graphical Statistics
, 2004
"... Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA su#ers from the fact that each principal component is a linear combination of all the original variables, thus it is often di#cult to interpret the results. We introduce a new method ca ..."
Abstract

Cited by 270 (6 self)
 Add to MetaCart
Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA su#ers from the fact that each principal component is a linear combination of all the original variables, thus it is often di#cult to interpret the results. We introduce a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings. We show that PCA can be formulated as a regressiontype optimization problem, then sparse loadings are obtained by imposing the lasso (elastic net) constraint on the regression coe#cients. E#cient algorithms are proposed to realize SPCA for both regular multivariate data and gene expression arrays. We also give a new formula to compute the total variance of modified principal components. As illustrations, SPCA is applied to real and simulated data, and the results are encouraging.
Scalable training of L1regularized loglinear models
 In ICML ’07
, 2007
"... The lbfgs limitedmemory quasiNewton method is the algorithm of choice for optimizing the parameters of largescale loglinear models with L2 regularization, but it cannot be used for an L1regularized loss due to its nondifferentiability whenever some parameter is zero. Efficient algorithms have ..."
Abstract

Cited by 169 (4 self)
 Add to MetaCart
The lbfgs limitedmemory quasiNewton method is the algorithm of choice for optimizing the parameters of largescale loglinear models with L2 regularization, but it cannot be used for an L1regularized loss due to its nondifferentiability whenever some parameter is zero. Efficient algorithms have been proposed for this task, but they are impractical when the number of parameters is very large. We present an algorithm OrthantWise Limitedmemory QuasiNewton (owlqn), based on lbfgs, that can efficiently optimize the L1regularized loglikelihood of loglinear models with millions of parameters. In our experiments on a parse reranking task, our algorithm was several orders of magnitude faster than an alternative algorithm, and substantially faster than lbfgs on the analogous L2regularized problem. We also present a proof that owlqn is guaranteed to converge to a globally optimal parameter vector. 1.
Hyperspectral unmixing overview: Geometrical, statistical, and sparse regressionbased approaches
 IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens
, 2012
"... Abstract—Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in hundreds or thousands of spectral channels with higher spectral resolution than multispectral cameras. Imaging spectrometers are therefore often referred to as hyperspectral cameras (HSCs). H ..."
Abstract

Cited by 104 (34 self)
 Add to MetaCart
Abstract—Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in hundreds or thousands of spectral channels with higher spectral resolution than multispectral cameras. Imaging spectrometers are therefore often referred to as hyperspectral cameras (HSCs). Higher spectral resolution enables material identification via spectroscopic analysis, which facilitates countless applications that require identifying materials in scenarios unsuitable for classical spectroscopic analysis. Due to low spatial resolution of HSCs, microscopic material mixing, and multiple scattering, spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus, accurate estimation requires unmixing. Pixels are assumed to be mixtures of a few materials, called endmembers. Unmixing involves estimating all or some of: the number of endmembers, their spectral signatures, and their abundances at each pixel. Unmixing is a challenging, illposed
Prediction by supervised principal components
 Journal of the American Statistical Association
, 2006
"... In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal co ..."
Abstract

Cited by 98 (9 self)
 Add to MetaCart
(Show Context)
In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal components is similar to conventional principal components analysis except that it uses a subset of the predictors selected based on their association with the outcome. Supervised principal components can be applied to regression and generalized regression problems, such as survival analysis. It compares favorably to other techniques for this type of problem, and can also account for the effects of other covariates and help identify which predictor variables are most important. We also provide asymptotic consistency results to help support our empirical findings. These methods could become important tools for DNA microarray data, where they may be used to more accurately diagnose and treat cancer. KEY WORDS: Gene expression; Microarray; Regression; Survival analysis. 1.
On the Asymptotic Properties of The Group Lasso Estimator in Least Squares Problems
"... We derive conditions guaranteeing estimation and model selection consistency, oracle properties and persistence for the grouplasso estimator and model selector proposed by Yuan and Lin (2006) for least squares problems when the covariates have a natural grouping structure. We study both the case of ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
We derive conditions guaranteeing estimation and model selection consistency, oracle properties and persistence for the grouplasso estimator and model selector proposed by Yuan and Lin (2006) for least squares problems when the covariates have a natural grouping structure. We study both the case of a fixeddimensional parameter space with increasing sample size and the case when the model complexity changes with the sample size. 1
Sparse inverse covariance matrix estimation via linear programming
, 2010
"... This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse ” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure ..."
Abstract

Cited by 54 (4 self)
 Add to MetaCart
(Show Context)
This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse ” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. The proposed method can be computed using linear programming and therefore has the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the method is adaptive to different types of sparsity of the problem.
N.: Tracking the flu pandemic by monitoring the Social Web
 In: 2nd IAPR Workshop on Cognitive Information Processing
, 2010
"... Abstract—Tracking the spread of an epidemic disease like seasonal or pandemic influenza is an important task that can reduce its impact and help authorities plan their response. In particular, early detection and geolocation of an outbreak are important aspects of this monitoring activity. Various m ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
(Show Context)
Abstract—Tracking the spread of an epidemic disease like seasonal or pandemic influenza is an important task that can reduce its impact and help authorities plan their response. In particular, early detection and geolocation of an outbreak are important aspects of this monitoring activity. Various methods are routinely employed for this monitoring, such as counting the consultation rates of general practitioners. We report on a monitoring tool to measure the prevalence of disease in a population by analysing the contents of social networking tools, such as Twitter. Our method is based on the analysis of hundreds of thousands of tweets per day, searching for symptomrelated statements, and turning statistical information into a fluscore. We have tested it in the United Kingdom for 24 weeks during the H1N1 flu pandemic. We compare our fluscore with data from the Health Protection Agency, obtaining on average a statistically significant linear correlation which is greater than 95%. This method uses completely independent data to that commonly used for these purposes, and can be used at close time intervals, hence providing inexpensive and timely information about the state of an epidemic. I.