Results 1  10
of
80
Sure independence screening for ultrahigh dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract

Cited by 283 (26 self)
 Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac
A SELECTIVE OVERVIEW OF VARIABLE SELECTION IN HIGH DIMENSIONAL FEATURE SPACE
, 2010
"... High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded ..."
Abstract

Cited by 70 (6 self)
 Add to MetaCart
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of nonconcave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultrahigh dimensional variable selection, with emphasis on independence screening and twoscale methods.
A unified approach to model selection and sparse recovery using regularized least squares
 Ann. Statist
, 2009
"... Model selection and sparse recovery are two important problems for which many regularization methods have been proposed. We study the properties of regularization methods in both problems under the unified framework of regularized least squares with concave penalties. For model selection, we establi ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
Model selection and sparse recovery are two important problems for which many regularization methods have been proposed. We study the properties of regularization methods in both problems under the unified framework of regularized least squares with concave penalties. For model selection, we establish conditions under which a regularized least squares estimator enjoys a nonasymptotic property, called the weak oracle property, where the dimensionality can grow exponentially with sample size. For sparse recovery, we present a sufficient condition that ensures the recoverability of the sparsest solution. In particular, we approach both problems by considering a family of penalties that give a smooth homotopy between L0 and L1 penalties. We also propose the sequentially and iteratively reweighted squares (SIRS) algorithm for sparse recovery. Numerical studies support our theoretical results and demonstrate the advantage of our new methods for model selection and sparse recovery. 1. Introduction. Model
Nonconcave penalized likelihood with NPdimensionality
 IEEE Trans. Inform. Theor
, 2011
"... Abstract—Penalized likelihood methods are fundamental to ultrahigh dimensional variable selection. How high dimensionality such methods can handle remains largely unknown. In this paper, we show that in the context of generalized linear models, such methods possess model selection consistency with o ..."
Abstract

Cited by 55 (14 self)
 Add to MetaCart
Abstract—Penalized likelihood methods are fundamental to ultrahigh dimensional variable selection. How high dimensionality such methods can handle remains largely unknown. In this paper, we show that in the context of generalized linear models, such methods possess model selection consistency with oracle properties even for dimensionality of nonpolynomial (NP) order of sample size, for a class of penalized likelihood approaches using foldedconcave penalty functions, which were introduced to ameliorate the bias problems of convex penalty functions. This fills a longstanding gap in the literature where the dimensionality is allowed to grow slowly with the sample size. Our results are also applicable to penalized likelihood with the L1penalty, which is a convex function at the boundary of the class of foldedconcave penalty functions under consideration. The coordinate optimization is implemented for finding the solution paths, whose performance is evaluated by a few simulation examples and the real data analysis. Index Terms—Coordinate optimization, foldedconcave penalty, high dimensionality, Lasso, nonconcave penalized likelihood, oracle property, SCAD, variable selection, weak oracle property. I.
Network exploration via the adaptive LASSO and SCAD penalties
 Ann. Appl. Stat
, 2009
"... Graphical models are frequently used to explore networks, such as genetic networks, among a set of variables. This is usually carried out via exploring the sparsity of the precision matrix of the variables under consideration. Penalized likelihood methods are often used in such explorations. Yet, po ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
Graphical models are frequently used to explore networks, such as genetic networks, among a set of variables. This is usually carried out via exploring the sparsity of the precision matrix of the variables under consideration. Penalized likelihood methods are often used in such explorations. Yet, positivedefiniteness constraints of precision matrices make the optimization problem challenging. We introduce nonconcave penalties and the adaptive LASSO penalty to attenuate the bias problem in the network estimation. Through the local linear approximation to the nonconcave penalty functions, the problem of precision matrix estimation is recast as a sequence of penalized likelihood problems with a weighted L1 penalty and solved using the efficient algorithm of Friedman et al. [Biostatistics 9 (2008) 432–441]. Our estimation schemes are applied to two real datasets. Simulation experiments and asymptotic theory are used to justify our proposed methods. 1. Introduction. Network
Penalized classification using fisher’s linear discriminant
 Journal of the Royal Statistical Society, Series B
, 2011
"... Summary. We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high dimensional setting where p n, LD ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
(Show Context)
Summary. We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high dimensional setting where p n, LDA is not appropriate for two reasons. First, the standard estimate for the withinclass covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule that is obtained from LDA, since it involves all p features.We propose penalized LDA, which is a general approach for penalizing the discriminant vectors in Fisher’s discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization–maximization approach to optimize it efficiently when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L1 and fused lasso penalties. Our proposal is equivalent to recasting Fisher’s discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high dimensional setting and explore their relationships with our proposal.
Feature selection by higher criticism thresholding: Optimal phase diagram
, 2008
"... We consider twoclass linear classification in a highdimensional, lowsample size setting. Only a small fraction of the features are useful, the useful features are unknown to us, and each useful feature contributes weakly to the classification decision – this setting was called the rare/weak model ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
(Show Context)
We consider twoclass linear classification in a highdimensional, lowsample size setting. Only a small fraction of the features are useful, the useful features are unknown to us, and each useful feature contributes weakly to the classification decision – this setting was called the rare/weak model (RW Model) in [11]. We select features by thresholding feature zscores. The threshold is set by higher criticism (HC) [11]. Let πi denote the Pvalue associated to the ith zscore and π(i) denote the ith order statistic of the collection of Pvalues. The HC threshold (HCT) is the order statistic of the zscore corresponding to index i maximizing (i/n − π(i)) / p π(i)(1 − π(i)). The ideal threshold optimizes the classification error. In [11] we showed that HCT was numerically close to the ideal threshold. We formalize an asymptotic framework for studying the RW model, considering a sequence of problems with increasingly many features and relatively fewer observations. We show that along this sequence, the limiting performance of ideal HCT is essentially just as good as the limiting performance of ideal thresholding. Our results describe twodimensional
Sparse linear discriminant analysis by thresholding for high dimensional data
 Annals of Statistics
, 2012
"... ar ..."
Feature selection in omics prediction problems using cat scores and false nondiscovery rate control
 Ann. Appl. Stat
, 2009
"... We revisit the problem of feature selection in linear discriminant analysis (LDA), i.e. when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative weights of Mahalanobistranformed predictors are given by correlat ..."
Abstract

Cited by 21 (11 self)
 Add to MetaCart
(Show Context)
We revisit the problem of feature selection in linear discriminant analysis (LDA), i.e. when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative weights of Mahalanobistranformed predictors are given by correlationadjusted t scores (cat scores). Second, for feature selection we propose thresholding cat scores by controlling false nondiscovery rates (FNDR). We show that contrary to previous claims this FNDR procedures performs very well and similar to “higher criticism”. Third, training of the classifier function is conducted by plugin of JamesStein shrinkage estimates of correlations and variances, using analytic procedures for choosing regularization parameters. Overall, this results in an effective and computationally inexpensive framework for highdimensional prediction with natural feature selection. The proposed shrinkage discriminant procedures are implemented in the R package “sda ” available from the R repository CRAN.
Gene ranking and biomarker discovery under correlation. Bioinformatics
"... Motivation: Biomarker discovery and gene ranking is a standard task in genomic high throughput analysis.Typically, the ordering of markers is based on a stabilized variant of the tscore, such as the moderated t or the SAM statistic. However, these procedures ignore genegene correlations, which may ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Motivation: Biomarker discovery and gene ranking is a standard task in genomic high throughput analysis.Typically, the ordering of markers is based on a stabilized variant of the tscore, such as the moderated t or the SAM statistic. However, these procedures ignore genegene correlations, which may have a profound impact on the gene orderings and on the power of the subsequent tests. Results: We propose a simple procedure that adjusts genewise tstatistics to take account of correlations among genes. The resulting correlationadjusted tscores (“cat ” scores) are derived from a predictive perspective, i.e. as a score for variable selection to discriminate group membership in twoclass linear discriminant analysis. In the absence of correlation the cat score reduces to the standard tscore. Moreover, using the cat score it is straightforward to evaluate groups of features (i.e. gene sets). For computation of the cat score from small sample data we propose a shrinkage procedure. In a comparative study comprising six different synthetic and empirical correlation structures we show that the cat score improves estimation of gene orderings and leads to higher power for fixed true discovery rate, and vice versa. Finally, we also illustrate the cat score by analyzing metabolomic data. Availability: The shrinkage cat score is implemented in the R package “st ” available from