Results 1  10
of
283
The Dantzig selector: statistical estimation when p is much larger than n
, 2005
"... In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y = Ax + z, where x ∈ R p is a parameter vector of interest, A is a data matrix with possibly far fewer rows than columns, n ≪ ..."
Abstract

Cited by 879 (14 self)
 Add to MetaCart
(Show Context)
In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y = Ax + z, where x ∈ R p is a parameter vector of interest, A is a data matrix with possibly far fewer rows than columns, n ≪ p, and the zi’s are i.i.d. N(0, σ 2). Is it possible to estimate x reliably based on the noisy data y? To estimate x, we introduce a new estimator—we call the Dantzig selector—which is solution to the ℓ1regularization problem min ˜x∈R p ‖˜x‖ℓ1 subject to ‖A T r‖ℓ ∞ ≤ (1 + t −1) √ 2 log p · σ, where r is the residual vector y − A˜x and t is a positive scalar. We show that if A obeys a uniform uncertainty principle (with unitnormed columns) and if the true parameter vector x is sufficiently sparse (which here roughly guarantees that the model is identifiable), then with very large probability ‖ˆx − x ‖ 2 ℓ2 ≤ C2 ( · 2 log p · σ 2 + ∑ min(x 2 i, σ 2) Our results are nonasymptotic and we give values for the constant C. In short, our estimator achieves a loss within a logarithmic factor of the ideal mean squared error one would achieve with an oracle which would supply perfect information about which coordinates are nonzero, and which were above the noise level. In multivariate regression and from a model selection viewpoint, our result says that it is possible nearly to select the best subset of variables, by solving a very simple convex program, which in fact can easily be recast as a convenient linear program (LP).
Asymptotic properties of bridge estimators in sparse highdimensional regression models
 Ann. Statist
, 2007
"... We study the asymptotic properties of bridge estimators in sparse, highdimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficien ..."
Abstract

Cited by 99 (10 self)
 Add to MetaCart
(Show Context)
We study the asymptotic properties of bridge estimators in sparse, highdimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficients are zero and covariates whose coefficients are nonzero. We show that under appropriate conditions, bridge estimators correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, bridge estimators have an oracle property in the sense of Fan and Li [J. Amer. Statist. Assoc. 96 (2001) 1348–1360] and Fan and Peng [Ann. Statist. 32 (2004) 928–961]. In general, the oracle property holds only if the number of covariates is smaller than the sample size. However, under a partial orthogonality condition in which the covariates of the zero coefficients are uncorrelated or weakly correlated with the covariates of nonzero coefficients, we show that marginal bridge estimators can correctly distinguish between covariates with nonzero and zero coefficients with probability converging to one even when the number of covariates is greater than the sample size.
Adaptive lasso for sparse highdimensional regression models. Statistica Sinica,
, 2008
"... Abstract: We study the asymptotic properties of the adaptive Lasso estimators in sparse, highdimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive Lasso, where the L1 norms in the penalty are reweig ..."
Abstract

Cited by 98 (11 self)
 Add to MetaCart
(Show Context)
Abstract: We study the asymptotic properties of the adaptive Lasso estimators in sparse, highdimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive Lasso, where the L1 norms in the penalty are reweighted by datadependent weights. We show that, if a reasonable initial estimator is available, under appropriate conditions, the adaptive Lasso correctly selects covariates with nonzero coefficients with probability converging to one, and that the estimators of nonzero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance. Thus, the adaptive Lasso has an oracle property in the sense of Fan and Li
High dimensional classification using features annealed independence rules
 Ann. Statist
, 2008
"... ABSTRACT. Classification using highdimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel an ..."
Abstract

Cited by 80 (18 self)
 Add to MetaCart
(Show Context)
ABSTRACT. Classification using highdimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in highdimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for highdimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the twosample tstatistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
Highdimensional variable selection
 Ann. Statist
, 2009
"... This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in highdimensional models? In particular, we look at the error rates and power of some multistage regression methods. In the first stage we fit a set of candidate models. In th ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in highdimensional models? In particular, we look at the error rates and power of some multistage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by crossvalidation. In the third stage we use hypothesis testing to eliminate some variables. We refer to the first two stages as “screening” and the last stage as “cleaning. ” We consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Our method gives consistent variable selection under certain conditions. 1. Introduction. Several
Highdimensional additive modeling
 Annals of Statistics
"... We propose a new sparsitysmoothness penalty for highdimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finitesample data. We present a computationally efficient algorithm, with provable numerical conver ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
(Show Context)
We propose a new sparsitysmoothness penalty for highdimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finitesample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for highdimensional but sparse additive models. Finally, an adaptive version of our sparsitysmoothness penalized approach yields large additional performance gains. 1
A SELECTIVE OVERVIEW OF VARIABLE SELECTION IN HIGH DIMENSIONAL FEATURE SPACE
, 2010
"... High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded ..."
Abstract

Cited by 70 (6 self)
 Add to MetaCart
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of nonconcave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultrahigh dimensional variable selection, with emphasis on independence screening and twoscale methods.
Nonconcave penalized likelihood with NPdimensionality
 IEEE Trans. Inform. Theor
, 2011
"... Abstract—Penalized likelihood methods are fundamental to ultrahigh dimensional variable selection. How high dimensionality such methods can handle remains largely unknown. In this paper, we show that in the context of generalized linear models, such methods possess model selection consistency with o ..."
Abstract

Cited by 55 (14 self)
 Add to MetaCart
Abstract—Penalized likelihood methods are fundamental to ultrahigh dimensional variable selection. How high dimensionality such methods can handle remains largely unknown. In this paper, we show that in the context of generalized linear models, such methods possess model selection consistency with oracle properties even for dimensionality of nonpolynomial (NP) order of sample size, for a class of penalized likelihood approaches using foldedconcave penalty functions, which were introduced to ameliorate the bias problems of convex penalty functions. This fills a longstanding gap in the literature where the dimensionality is allowed to grow slowly with the sample size. Our results are also applicable to penalized likelihood with the L1penalty, which is a convex function at the boundary of the class of foldedconcave penalty functions under consideration. The coordinate optimization is implemented for finding the solution paths, whose performance is evaluated by a few simulation examples and the real data analysis. Index Terms—Coordinate optimization, foldedconcave penalty, high dimensionality, Lasso, nonconcave penalized likelihood, oracle property, SCAD, variable selection, weak oracle property. I.
Nonparametric independence screening in sparse ultrahigh dimensional additive models
, 2010
"... A variable screening procedure via correlation learning was proposed by Fan and Lv (2008) to reduce dimensionality in sparse ultrahighdimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation lea ..."
Abstract

Cited by 54 (8 self)
 Add to MetaCart
(Show Context)
A variable screening procedure via correlation learning was proposed by Fan and Lv (2008) to reduce dimensionality in sparse ultrahighdimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening (NIS) is a specific type of sure independence screening. We propose several closely related variable screening procedures. We show that with general nonparametric models, under some mild technical conditions, the proposed independence screening methods have a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, we also propose a datadriven thresholding and an iterative nonparametric independence screening (INIS) method to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods.