Results 11  20
of
683
Sparsistency and rates of convergence in large covariance matrices estimation
, 2009
"... This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probabi ..."
Abstract

Cited by 110 (12 self)
 Add to MetaCart
This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (sn log pn/n) 1/2, where sn is the number of nonzero elements, pn is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of highdimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λn goes to 0 have been made explicit and compared under different penalties. As a result, for the L1penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: s ′ n = O(pn) at most, among O(p2 n) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where s ′ n is the number of the nonzero elements on the offdiagonal entries. On the other hand, using the SCAD or hardthresholding penalty functions, there is no such a restriction.
On the conditions used to prove oracle results for the Lasso
 Electron. J. Stat
"... Abstract: Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue conditio ..."
Abstract

Cited by 103 (5 self)
 Add to MetaCart
(Show Context)
Abstract: Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue condition [2] or the slightly weaker compatibility condition [18] are sufficient for oracle results. We argue that both these conditions allow for a fairly general class of design matrices. Hence, optimality of the Lasso for prediction and estimation holds for more general situations than what it appears from coherence [5, 4] or restricted isometry [10] assumptions.
Boosting algorithms: Regularization, prediction and model fitting
 Statistical Science
, 2007
"... Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and correspo ..."
Abstract

Cited by 99 (12 self)
 Add to MetaCart
(Show Context)
Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in highdimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated opensource software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing userspecified loss functions. Key words and phrases: Generalized linear models, generalized additive models, gradient boosting, survival analysis, variable selection, software. 1.
Adaptive lasso for sparse highdimensional regression models. Statistica Sinica,
, 2008
"... Abstract: We study the asymptotic properties of the adaptive Lasso estimators in sparse, highdimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive Lasso, where the L1 norms in the penalty are reweig ..."
Abstract

Cited by 98 (11 self)
 Add to MetaCart
(Show Context)
Abstract: We study the asymptotic properties of the adaptive Lasso estimators in sparse, highdimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive Lasso, where the L1 norms in the penalty are reweighted by datadependent weights. We show that, if a reasonable initial estimator is available, under appropriate conditions, the adaptive Lasso correctly selects covariates with nonzero coefficients with probability converging to one, and that the estimators of nonzero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance. Thus, the adaptive Lasso has an oracle property in the sense of Fan and Li
H (2007): Networkconstrained regularization and variable selection for analysis of genomic data. UPenn Biostatistics Working Paper
"... Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a ..."
Abstract

Cited by 91 (5 self)
 Add to MetaCart
Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this paper, we introduce a networkconstrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a networkconstrained penalty function that penalizes the L1norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene expression dataset identified several subnetworks on several of the KEGG transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed networkconstrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression fraemwork. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: Hongzhe Li,
Bolasso: model consistent lasso estimation through the bootstrap
 In Proceedings of the Twentyfifth International Conference on Machine Learning (ICML
, 2008
"... We consider the leastsquare linear regression problem with regularization by the ℓ1norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic ..."
Abstract

Cited by 85 (15 self)
 Add to MetaCart
(Show Context)
We consider the leastsquare linear regression problem with regularization by the ℓ1norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i.e., variable selection). For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability. We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection. This novel variable selection algorithm, referred to as the Bolasso, is compared favorably to other linear regression methods on synthetic data and datasets from the UCI machine learning repository. 1.
Supnorm convergence rate and sign concentration property of Lasso and Dantzig estimators
 ELECTRONIC JOURNAL OF STATISTICS
, 2008
"... ..."
Distributed Spectrum Sensing for Cognitive Radio Networks by Exploiting Sparsity
"... Abstract—A cooperative approach to the sensing task of wireless cognitive radio (CR) networks is introduced based on a basis expansion model of the power spectral density (PSD) map in space and frequency. Joint estimation of the model parameters enables identification of the (un)used frequency bands ..."
Abstract

Cited by 80 (7 self)
 Add to MetaCart
(Show Context)
Abstract—A cooperative approach to the sensing task of wireless cognitive radio (CR) networks is introduced based on a basis expansion model of the power spectral density (PSD) map in space and frequency. Joint estimation of the model parameters enables identification of the (un)used frequency bands at arbitrary locations, and thus facilitates spatial frequency reuse. The novel scheme capitalizes on two forms of sparsity: the first one introduced by the narrowband nature of transmitPSDs relative to the broad swaths of usable spectrum; and the second one emerging from sparsely located active radios in the operational space. An estimator of the model coefficients is developed based on the Lasso algorithm to exploit these forms of sparsity and reveal the unknown positions of transmitting CRs. The resultant scheme can be implemented via distributed online iterations, which solve quadratic programs locally (one per radio), and are adaptive to changes in the system. Simulations corroborate that exploiting sparsity in CR sensing reduces spatial and frequency spectrum leakage by 15 dB relative to leastsquares (LS) alternatives. Index Terms—Cognitive radios, compressive sampling, cooperative systems, distributed estimation, parallel network processing, sensing, sparse models, spectral analysis. I.
Highdimensional additive modeling
 Annals of Statistics
"... We propose a new sparsitysmoothness penalty for highdimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finitesample data. We present a computationally efficient algorithm, with provable numerical conver ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
(Show Context)
We propose a new sparsitysmoothness penalty for highdimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finitesample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for highdimensional but sparse additive models. Finally, an adaptive version of our sparsitysmoothness penalized approach yields large additional performance gains. 1
An interiorpoint method for largescale ℓ1regularized logistic regression
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... Recently, a lot of attention has been paid to ℓ1regularization based methods for sparse signal reconstruction (e.g., basis pursuit denoising and compressed sensing) and feature selection (e.g., the Lasso algorithm) in signal processing, statistics, and related fields. These problems can be cast as ..."
Abstract

Cited by 74 (6 self)
 Add to MetaCart
Recently, a lot of attention has been paid to ℓ1regularization based methods for sparse signal reconstruction (e.g., basis pursuit denoising and compressed sensing) and feature selection (e.g., the Lasso algorithm) in signal processing, statistics, and related fields. These problems can be cast as ℓ1regularized leastsquares programs (LSPs), which can be reformulated as convex quadratic programs, and then solved by several standard methods such as interiorpoint methods, at least for small and medium size problems. In this paper, we describe a specialized interiorpoint method for solving largescale ℓ1regularized LSPs that uses the preconditioned conjugate gradients algorithm to compute the search direction. The interiorpoint method can solve large sparse problems, with a million variables and observations, in a few tens of minutes on a PC. It can efficiently solve large dense problems, that arise in sparse signal recovery with orthogonal transforms, by exploiting fast algorithms for these transforms. The method is illustrated on a magnetic resonance imaging data set.