Results 1 - 10
of
62
Consistency of the group lasso and multiple kernel learning
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... We consider the least-square regression problem with regularization by a block 1-norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1-norm where all spaces have dimension one, where it ..."
Abstract
-
Cited by 81 (14 self)
- Add to MetaCart
We consider the least-square regression problem with regularization by a block 1-norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1-norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic model consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
Spam: Sparse additive models
- In Advances in Neural Information Processing Systems 20
, 2007
"... We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when t ..."
Abstract
-
Cited by 37 (9 self)
- Add to MetaCart
We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data. 1
A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers
, 2010
"... ..."
Hiroshi Imai and Masao Iri. Polygonal approximations of a curve – formulations and algorithms
- Computational Morphology
, 1988
"... Regularization by the sum of singular values, also referred to as the trace norm, is a popular technique for estimating low rank rectangular matrices. In this paper, we extend some of the consistency results of the Lasso to provide necessary and sufficient conditions for rank consistency of trace no ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
Regularization by the sum of singular values, also referred to as the trace norm, is a popular technique for estimating low rank rectangular matrices. In this paper, we extend some of the consistency results of the Lasso to provide necessary and sufficient conditions for rank consistency of trace norm minimization with the square loss. We also provide an adaptive version that is rank consistent even when the necessary condition for the non adaptive version is not fulfilled. 1.
Optimal Solutions for Sparse Principal Component Analysis
"... Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applica ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all target numbers of non zero coefficients, with total complexity O(n 3), where n is the number of variables. We then use the same relaxation to derive sufficient conditions for global optimality of a solution, which can be tested in O(n 3) per pattern. We discuss applications in subset selection and sparse recovery and show on artificial examples and biological data that our algorithm does provide globally optimal solutions in many cases.
Some sharp performance bounds for least squares regression with L1 regularization
- Rutgers Univ. MODEL SELECTION 35 Applied and Computational Mathematics California Institute of Technology 300 Firestone, Mail Code 217-50 Pasadena, California 91125 E-mail: emmanuel@acm.caltech.edu plan@acm.caltech.edu
, 2009
"... We derive sharp performance bounds for least squares regression with L1 regularization from parameter estimation accuracy and feature selection quality perspectives. The main result proved for L1 regularization extends a similar result in [Ann. Statist. 35 (2007) 2313–2351] for the Dantzig selector. ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
We derive sharp performance bounds for least squares regression with L1 regularization from parameter estimation accuracy and feature selection quality perspectives. The main result proved for L1 regularization extends a similar result in [Ann. Statist. 35 (2007) 2313–2351] for the Dantzig selector. It gives an affirmative answer to an open question in [Ann. Statist. 35 (2007) 2358–2364]. Moreover, the result leads to an extended view of feature selection that allows less restrictive conditions than some recent work. Based on the theoretical insights, a novel two-stage L1-regularization procedure with selective penalization is analyzed. It is shown that if the target parameter vector can be decomposed as the sum of a sparse parameter vector with large coefficients and another less sparse vector with relatively small coefficients, then the two-stage procedure can lead to improved performance.
Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting
, 2007
"... Abstract—The problem of sparsity pattern or support set recovery refers to estimating the set of nonzero coefficients of an un-3 p known vector 2 based on a set of n noisy observations. It arises in a variety of settings, including subset selection in regression, graphical model selection, signal de ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Abstract—The problem of sparsity pattern or support set recovery refers to estimating the set of nonzero coefficients of an un-3 p known vector 2 based on a set of n noisy observations. It arises in a variety of settings, including subset selection in regression, graphical model selection, signal denoising, compressive sensing, and constructive approximation. The sample complexity of a given method for subset recovery refers to the scaling of the required sample size n as a function of the signal dimension p, sparsity index k (number of non-zeroes in 3), as well as the minimum value min of 3 over its support and other parameters of measurement matrix. This paper studies the information-theoretic limits of sparsity recovery: in particular, for a noisy linear observation model based on random measurement matrices drawn from general Gaussian measurement matrices, we derive both a set of sufficient conditions for exact support recovery using an exhaustive search decoder, as well as a set of necessary conditions that any decoder, regardless of its computational complexity, must satisfy for exact support recovery. This analysis of fundamental limits complements our previous work on sharp thresholds for support set recovery over the same set of random measurement ensembles using the polynomial-time Lasso method (`1-constrained quadratic programming). Index Terms—Compressed sensing, `1-relaxation, Fano’s method, high-dimensional statistical inference, information-theoretic
Bolasso: model consistent lasso estimation through the bootstrap
- In Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML
, 2008
"... We consider the least-square linear regression problem with regularization by the ℓ1-norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
We consider the least-square linear regression problem with regularization by the ℓ1-norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i.e., variable selection). For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability. We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection. This novel variable selection algorithm, referred to as the Bolasso, is compared favorably to other linear regression methods on synthetic data and datasets from the UCI machine learning repository. 1.
Stability selection
"... Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. 1 2 ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. 1 2
On the Asymptotic Properties of The Group Lasso Estimator in Least Squares Problems
"... We derive conditions guaranteeing estimation and model selection consistency, oracle properties and persistence for the group-lasso estimator and model selector proposed by Yuan and Lin (2006) for least squares problems when the covariates have a natural grouping structure. We study both the case of ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We derive conditions guaranteeing estimation and model selection consistency, oracle properties and persistence for the group-lasso estimator and model selector proposed by Yuan and Lin (2006) for least squares problems when the covariates have a natural grouping structure. We study both the case of a fixed-dimensional parameter space with increasing sample size and the case when the model complexity changes with the sample size. 1

