Results 1  10
of
18
On Model Selection Consistency of Lasso
, 2006
"... Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used ..."
Abstract

Cited by 462 (23 self)
 Add to MetaCart
Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used as a computationally feasible alternative to model selection.
The group Lasso for logistic regression
 Journal of the Royal Statistical Society, Series B
, 2008
"... Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regressi ..."
Abstract

Cited by 278 (11 self)
 Add to MetaCart
(Show Context)
Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a twostage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the twostage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.
Highdimensional additive modeling
 Annals of Statistics
"... We propose a new sparsitysmoothness penalty for highdimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finitesample data. We present a computationally efficient algorithm, with provable numerical conver ..."
Abstract

Cited by 81 (3 self)
 Add to MetaCart
(Show Context)
We propose a new sparsitysmoothness penalty for highdimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finitesample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for highdimensional but sparse additive models. Finally, an adaptive version of our sparsitysmoothness penalized approach yields large additional performance gains. 1
Statistical challenges with high dimensionality: feature selection in knowledge discovery
, 2006
"... ..."
(Show Context)
Sparse Boosting
 Journal of Machine Learning Research
, 2006
"... We propose Sparse Boosting (the SparseL 2 Boost algorithm), a variant on boosting with the squared error loss. SparseL 2 Boost yields sparser solutions than the previously proposed L 2 Boosting by minimizing some penalized L 2 loss functions, the FPE model selection criteria, through smallstep g ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
We propose Sparse Boosting (the SparseL 2 Boost algorithm), a variant on boosting with the squared error loss. SparseL 2 Boost yields sparser solutions than the previously proposed L 2 Boosting by minimizing some penalized L 2 loss functions, the FPE model selection criteria, through smallstep gradient descent. Although boosting may give already relatively sparse solutions, for example corresponding to the softthresholding estimator in orthogonal linear models, there is sometimes a desire for more sparseness to increase prediction accuracy and ability for better variable selection: such goals can be achieved with SparseL 2 Boost.
Preconditioning’’ for feature selection and regression in highdimensional problems
 Ann. Statist
, 2008
"... We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a “preconditioned” response variable. The primary method used for this initial regression is su ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
(Show Context)
We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a “preconditioned” response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the preconditioned response variable. In a number of simulated and real data examples, this twostep procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the preconditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data.
Discussion of “Onestep sparse estimates in nonconcave penalized likelihood models” (auths
, 2007
"... Hui Zou and Runze Li ought to be congratulated for their nice and interesting work which presents a variety of ideas and insights in statistical methodology, computing and asymptotics. We agree with them that one or even multistep (orstage) procedures are currently among the best for analyzing co ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
Hui Zou and Runze Li ought to be congratulated for their nice and interesting work which presents a variety of ideas and insights in statistical methodology, computing and asymptotics. We agree with them that one or even multistep (orstage) procedures are currently among the best for analyzing complex datasets. The focus of our discussion is mainly on highdimensional problems where p ≫ n: we will illustrate, empirically and by describing some theory, that many of the ideas from the current paper are very useful for the p ≫ n setting as well. 1. Nonconvex objective function and multistep convex optimization. The paper demonstrates a nice, and in a sense surprising, connection between difficult nonconvex optimization and computationally efficient Lassotype methodology which involves one (or multi) step convex optimization. The SCADpenalty function [5] has been often criticized from a computational point of view as it corresponds to a nonconvex objective function which is difficult to minimize; mainly in situations with many covariates, optimizing SCADpenalized likelihood becomes an awkward task. The usual way to optimize a SCADpenalized likelihood is to use a local quadratic approximation. Zou and Li show here what happens if one uses a local linear approximation instead. In 2001, when Fan and Li [5] proposed the SCADpenalty, it was probably easier to work with a quadratic approximation. Nowadays, and because of the contribution of the current paper, a local linear approximation seems as easy to use, thanks to the homotopy method [12] and the LARS algorithm [4]. While the latter is suited for linear models, more sophisticated algorithms have been proposed for generalized linear models; cf. [6, 8, 13]. In addition, and importantly, the local linear approximation yields sparse model fits where quite a few or even many of the coefficients in a linear or
Smoothing ℓ1penalized estimators for highdimensional timecourse data
 Electronic Journal of Statistics
, 2007
"... Abstract: When a series of (related) linear models has to be estimated it is often appropriate to combine the different datasets to construct more efficient estimators. We use ℓ1penalized estimators like the Lasso or the Adaptive Lasso which can simultaneously do parameter estimation and model sel ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract: When a series of (related) linear models has to be estimated it is often appropriate to combine the different datasets to construct more efficient estimators. We use ℓ1penalized estimators like the Lasso or the Adaptive Lasso which can simultaneously do parameter estimation and model selection. We show that for a timecourse of highdimensional linear models the convergence rates of the Lasso and of the Adaptive Lasso can be improved by combining the different timepoints in a suitable way. Moreover, the Adaptive Lasso still enjoys oracle properties and consistent variable selection. The finite sample properties of the proposed methods are illustrated on simulated data and on a real problem of motif finding in DNA sequences.
A Comparative Framework for Preconditioned Lasso Algorithms
"... The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated. This limitation has led to a growing number of Preconditioned Lasso algorithms that premultiply X and y by matrices PX, Py prior to running the s ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated. This limitation has led to a growing number of Preconditioned Lasso algorithms that premultiply X and y by matrices PX, Py prior to running the standard Lasso. A direct comparison of these and similar Lassostyle algorithms to the original Lasso is difficult because the performance of all of these methods depends critically on an auxiliary penalty parameter λ. In this paper we propose an agnostic framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose λ. We apply our framework to three Preconditioned Lasso instances and highlight cases when they will outperform the Lasso. Additionally, our theory reveals fragilities of these algorithms to which we provide partial solutions. 1