Results 1  10
of
454
On Model Selection Consistency of Lasso
, 2006
"... Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used ..."
Abstract

Cited by 477 (20 self)
 Add to MetaCart
Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used as a computationally feasible alternative to model selection.
On the distribution of the largest eigenvalue in principal components analysis
 ANN. STATIST
, 2001
"... Let x �1 � denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x �1 � is the largest principal component variance of the covariance matrix X ′ X, or the largest eigenvalue of a pvariate Wishart distribu ..."
Abstract

Cited by 422 (4 self)
 Add to MetaCart
Let x �1 � denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x �1 � is the largest principal component variance of the covariance matrix X ′ X, or the largest eigenvalue of a pvariate Wishart distribution on n degrees of freedom with identity covariance. Consider the limit of large p and n with n/p = γ ≥ 1. When centered by µ p = � √ n − 1 + √ p � 2 and scaled by σ p = � √ n − 1 + √ p��1 / √ n − 1 + 1 / √ p � 1/3 � the distribution of x �1 � approaches the Tracy–Widom lawof order 1, which is defined in terms of the Painlevé II differential equation and can be numerically evaluated and tabulated in software. Simulations showthe approximation to be informative for n and p as small as 5. The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to apply in practice than their fixed p counterparts.
Sure independence screening for ultrahigh dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract

Cited by 283 (26 self)
 Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac
The sparsity and bias of the lasso selection in highdimensional linear regression. Ann. Statist. Volume 36, Number 4, 15671594. Alexandre Belloni Duke University Fuqua
 School of Business 1 Towerview Drive Durham, NC 277080120 PO Box 90120 Email: abn5@duke.edu Victor Chernozhukov Massachusetts Institute of Technology Department of Economics and Operations research Center 50 Memorial Drive Room E52262f Cambridge, MA 02
, 2008
"... showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541–2567] formalized the neighborho ..."
Abstract

Cited by 191 (27 self)
 Add to MetaCart
showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541–2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the ℓαloss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs. 1. Introduction. Consider
No eigenvalues outside the support of the limiting spectral distribution of largedimensional sample covariance matrices
 ANNALS OF PROBABILITY 26
, 1998
"... We consider a class of matrices of the form Cn = (1/N)(Rn+σXn)(Rn+σXn) ∗, where Xn is an n × N matrix consisting of independent standardized complex entries, Rj is an n×N nonrandom matrix, and σ> 0. Among several applications, Cn can be viewed as a sample correlation matrix, where information is ..."
Abstract

Cited by 186 (35 self)
 Add to MetaCart
We consider a class of matrices of the form Cn = (1/N)(Rn+σXn)(Rn+σXn) ∗, where Xn is an n × N matrix consisting of independent standardized complex entries, Rj is an n×N nonrandom matrix, and σ> 0. Among several applications, Cn can be viewed as a sample correlation matrix, where information is contained in (1/N)RnR ∗ n, but each column of Rn is contaminated by noise. As n → ∞, if n/N → c> 0, and the empirical distribution of the eigenvalues of (1/N)RnR ∗ n converge to a proper probability distribution, then the empirical distribution of the eigenvalues of Cn converges a.s. to a nonrandom limit. In this paper we show that, under certain conditions on Rn, for any closed interval in R + outside the support of the limiting distribution, then, almost surely, no eigenvalues of Cn will appear in this interval for all n large.
Random matrices: Universality of local eigenvalue statistics up to the edge
, 2009
"... This is a continuation of our earlier paper [25] on the universality of the eigenvalues of Wigner random matrices. The main new results of this paper are an extension of the results in [25] from the bulk of the spectrum up to the edge. In particular, we prove a variant of the universality results ..."
Abstract

Cited by 156 (18 self)
 Add to MetaCart
(Show Context)
This is a continuation of our earlier paper [25] on the universality of the eigenvalues of Wigner random matrices. The main new results of this paper are an extension of the results in [25] from the bulk of the spectrum up to the edge. In particular, we prove a variant of the universality results of Soshnikov [23] for the largest eigenvalues, assuming moment conditions rather than symmetry conditions. The main new technical observation is that there is a significant bias in the Cauchy interlacing law near the edge of the spectrum which allows one to continue ensuring the delocalization of eigenvectors.
Sparsistency and rates of convergence in large covariance matrices estimation
, 2009
"... This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probabi ..."
Abstract

Cited by 110 (12 self)
 Add to MetaCart
This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (sn log pn/n) 1/2, where sn is the number of nonzero elements, pn is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of highdimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λn goes to 0 have been made explicit and compared under different penalties. As a result, for the L1penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: s ′ n = O(pn) at most, among O(p2 n) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where s ′ n is the number of the nonzero elements on the offdiagonal entries. On the other hand, using the SCAD or hardthresholding penalty functions, there is no such a restriction.
Universality of the Local Spacing Distribution in Certain Ensembles of Hermitian Wigner Matrices
 MATRICES, COMMUN. MATH. PHYS
, 2001
"... Consider an N × N hermitian random matrix with independent entries, not necessarily Gaussian, a so called Wigner matrix. It has been conjectured that the local spacing distribution, i.e. the distribution of the distance between nearest neighbour eigenvalues in some part of the spectrum is, in the ..."
Abstract

Cited by 102 (5 self)
 Add to MetaCart
Consider an N × N hermitian random matrix with independent entries, not necessarily Gaussian, a so called Wigner matrix. It has been conjectured that the local spacing distribution, i.e. the distribution of the distance between nearest neighbour eigenvalues in some part of the spectrum is, in the limit as N → ∞, the same as that of hermitian random matrices from GUE. We prove this conjecture for a certain subclass of hermitian Wigner matrices.
Randomly Spread CDMA: Asymptotics via Statistical Physics
, 2005
"... This paper studies randomly spread codedivision multiple access (CDMA) and multiuser detection in the largesystem limit using the replica method developed in statistical physics. Arbitrary input distributions and flat fading are considered. A generic multiuser detector in the form of the posterio ..."
Abstract

Cited by 101 (12 self)
 Add to MetaCart
This paper studies randomly spread codedivision multiple access (CDMA) and multiuser detection in the largesystem limit using the replica method developed in statistical physics. Arbitrary input distributions and flat fading are considered. A generic multiuser detector in the form of the posterior mean estimator is applied before singleuser decoding. The generic detector can be particularized to the matched filter, decorrelator, linear MMSE detector, the jointly or the individually optimal detector, and others. It is found that the detection output for each user, although in general asymptotically nonGaussian conditioned on the transmitted symbol, converges as the number of users go to infinity to a deterministic function of a “hidden ” Gaussian statistic independent of the interferers. Thus the multiuser channel can be decoupled: Each user experiences an equivalent singleuser Gaussian channel, whose signaltonoise ratio suffers a degradation due to the multipleaccess interference. The uncoded error performance (e.g., symbolerrorrate) and the mutual information can then be fully characterized using the degradation factor, also known as the multiuser efficiency, which can be obtained by solving a pair of coupled fixedpoint equations identified in this paper. Based on a general linear vector channel model, the results are also applicable to MIMO channels such as in multiantenna systems.