Results 1  10
of
144
Sure independence screening for ultrahigh dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract

Cited by 283 (26 self)
 Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac
Consistency of the group lasso and multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it ..."
Abstract

Cited by 274 (33 self)
 Add to MetaCart
We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic model consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Sparsistency and rates of convergence in large covariance matrices estimation
, 2009
"... This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probabi ..."
Abstract

Cited by 110 (12 self)
 Add to MetaCart
This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (sn log pn/n) 1/2, where sn is the number of nonzero elements, pn is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of highdimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λn goes to 0 have been made explicit and compared under different penalties. As a result, for the L1penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: s ′ n = O(pn) at most, among O(p2 n) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where s ′ n is the number of the nonzero elements on the offdiagonal entries. On the other hand, using the SCAD or hardthresholding penalty functions, there is no such a restriction.
The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
"... Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by ..."
Abstract

Cited by 92 (21 self)
 Add to MetaCart
(Show Context)
Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of onedimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for estimating the nonparanormal, study the method’s theoretical properties, and show that it works well in many examples.
SUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2010
"... In multivariate regression, a Kdimensional response vector is regressed upon a common set of p covariates, with a matrix B ∗ ∈ R p×K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the ℓ1/ℓ2 norm is used for support union re ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
In multivariate regression, a Kdimensional response vector is regressed upon a common set of p covariates, with a matrix B ∗ ∈ R p×K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the ℓ1/ℓ2 norm is used for support union recovery, or recovery of the set of s rows for which B ∗ is nonzero. Under highdimensional scaling, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter θ(n, p, s) : = n/[2ψ(B ∗ ) log(p − s)]. Here n is the sample size, and ψ(B ∗ ) is a sparsityoverlap function measuring a combination of the sparsities and overlaps of the Kregression coefficient vectors that constitute the model. We prove that the multivariate group Lasso succeeds for problem sequences (n, p, s) such that θ(n, p, s) exceeds a critical level θu, and fails for sequences such that θ(n, p, s) lies below a critical level θℓ. For the special case of the standard Gaussian ensemble, we show that θℓ = θu so that the characterization is sharp. The sparsityoverlap function ψ(B ∗ ) reveals that, if the design is uncorrelated on the active rows, ℓ1/ℓ2 regularization for multivariate regression never harms performance relative to an ordinary Lasso approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the coefficient vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the multivariate group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.
Highdimensional additive modeling
 Annals of Statistics
"... We propose a new sparsitysmoothness penalty for highdimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finitesample data. We present a computationally efficient algorithm, with provable numerical conver ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
(Show Context)
We propose a new sparsitysmoothness penalty for highdimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finitesample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for highdimensional but sparse additive models. Finally, an adaptive version of our sparsitysmoothness penalized approach yields large additional performance gains. 1
A SELECTIVE OVERVIEW OF VARIABLE SELECTION IN HIGH DIMENSIONAL FEATURE SPACE
, 2010
"... High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded ..."
Abstract

Cited by 70 (6 self)
 Add to MetaCart
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of nonconcave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultrahigh dimensional variable selection, with emphasis on independence screening and twoscale methods.
A Dirty Model for Multitask Learning
 In NIPS
, 2010
"... We consider multitask learning in the setting of multiple linear regression, and where some relevant features could be shared across the tasks. Recent research has studied the use ofℓ1/ℓq norm blockregularizations withq> 1 for such blocksparse structured problems, establishing strong guarantees ..."
Abstract

Cited by 67 (2 self)
 Add to MetaCart
(Show Context)
We consider multitask learning in the setting of multiple linear regression, and where some relevant features could be shared across the tasks. Recent research has studied the use ofℓ1/ℓq norm blockregularizations withq> 1 for such blocksparse structured problems, establishing strong guarantees on recovery even under highdimensional scaling where the number of features scale with the number of observations. However, these papers also caution that the performance of such blockregularized methods are very dependent on the extent to which the features are shared across tasks. Indeed they show [8] that if the extent of overlap is less than a threshold, or even if parameter values in the shared features are highly uneven, then block ℓ1/ℓq regularization could actually perform worse than simple separate elementwise ℓ1 regularization. Since these caveats depend on the unknown true parameters, we might not know when and which method to apply. Even otherwise, we are far away from a realistic multitask setting: not only do the set of relevant features have to be exactly the same across tasks, but their values