• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Non-concave penalized likelihood with a diverging number of parameters (2004)

by J Fan, H Peng
Venue:Annals of Statistics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 169
Next 10 →

The adaptive LASSO and its oracle properties

by Hui Zou - Journal of the American Statistical Association
"... The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain sc ..."
Abstract - Cited by 683 (10 self) - Add to MetaCart
The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain scenarios where the lasso is inconsistent for variable selection. We then propose a new version of the lasso, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the!1 penalty. We show that the adaptive lasso enjoys the oracle properties; namely, it performs as well as if the true underlying model were given in advance. Similar to the lasso, the adaptive lasso is shown to be near-minimax optimal. Furthermore, the adaptive lasso can be solved by the same efficient algorithm for solving the lasso. We also discuss the extension of the adaptive lasso in generalized linear models and show that the oracle properties still hold under mild regularity conditions. As a byproduct of our theory, the nonnegative garotte is shown to be consistent for variable selection.
(Show Context)

Citation Context

... subset model, { j : β̂j "= 0} =A • Has the optimal estimation rate, √n(β̂(δ)A − β∗A) →d N(0,"∗), where "∗ is the covariance matrix knowing the true subset model. It has been argued (Fan and Li 2001; =-=Fan and Peng 2004-=-) that a good procedure should have these oracle properties. However, some extra conditions besides the oracle properties, such as continuous shrinkage, are also required in an optimal procedure. Ordi...

Sure independence screening for ultra-high dimensional feature space

by Jianqing Fan, Jinchi Lv , 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowa-days appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract - Cited by 283 (26 self) - Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowa-days appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac-

Large Sample Sieve Estimation of Semi-Nonparametric Models

by Xiaohong Chen - Handbook of Econometrics , 2007
"... Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method o ..."
Abstract - Cited by 185 (19 self) - Add to MetaCart
Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method of sieves provides one way to tackle such complexities by optimizing an empirical criterion function over a sequence of approximating parameter spaces, called sieves, which are significantly less complex than the original parameter space. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated econometric models. For example, it can simultaneously estimate the parametric and nonparametric components in semi-nonparametric models with or without constraints. It can easily incorporate prior information, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and non-negativity. This chapter describes estimation of semi-nonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve M-estimates, pointwise normality of series estimates of regression functions, root-n asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite dimensional parameters. Examples are used to illustrate the general results.

Nearly unbiased variable selection under minimax concave penalty

by Cun-hui Zhang - Annals of Statistics , 2010
"... ar ..."
Abstract - Cited by 184 (14 self) - Add to MetaCart
Abstract not found

Piecewise linear regularized solution paths,

by Saharon Rosset , Ji Zhu - The Annals of Statistics, , 2007
"... Abstract We consider the generic regularized optimization problemβ(λ) = arg min β L(y, Xβ) + λJ(β). Recently, ..."
Abstract - Cited by 140 (9 self) - Add to MetaCart
Abstract We consider the generic regularized optimization problemβ(λ) = arg min β L(y, Xβ) + λJ(β). Recently,
(Show Context)

Citation Context

...ch efficient algorithms can be designed, leaves out some other statistically well motivated fitting approaches. The use of a nonconvex penalty was advocated by Fan and collaborators in several papers =-=[4, 5]-=-. They expose the favorable variable selection property of the penalty function they offer, which can be viewed as an improvement over the use of ℓ1 penalty. [16] advocates the use of nonconvex ψ-loss...

One-step sparse estimates in nonconcave penalized likelihood models

by Hui Zou, Runze Li - ANN. STATIST. , 2008
"... Fan and Li propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective funct ..."
Abstract - Cited by 133 (6 self) - Add to MetaCart
Fan and Li propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective function is nondifferentiable and nonconcave. In this article, we propose a new unified algorithm based on the local linear approximation (LLA) for maximizing the penalized likelihood for a broad class of concave penalty functions. Convergence and other theoretical properties of the LLA algorithm are established. A distinguished feature of the LLA algorithm is that at each LLA step, the LLA estimator can naturally adopt a sparse representation. Thus, we suggest using the one-step LLA estimator from the LLA algorithm as the final estimates. Statistically, we show that if the regularization parameter is appropriately chosen, the one-step LLA estimates enjoy the oracle properties with good initial estimators. Computationally, the one-step LLA estimation methods dramatically reduce the computational cost in maximizing the nonconcave penalized likelihood. We conduct some Monte Carlo simulation to assess the finite sample performance of the one-step sparse estimation methods. The results are very encouraging.
(Show Context)

Citation Context

...ns has been proposed to select significant variables for various parametric models, including generalized linear regression models and robust linear regression model (Fan and Li [10] and Fan and Peng =-=[15]-=-), and some semiparametric models, such as the Cox model and partially linear models (Fan and Li [11, 12] and Cai, Fan, Li and Zhou [5]). Fan and Li [10] provide deep insights into how to select a pen...

Sparsistency and rates of convergence in large covariance matrices estimation

by Clifford Lam, Jianqing Fan , 2009
"... This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probabi ..."
Abstract - Cited by 110 (12 self) - Add to MetaCart
This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (sn log pn/n) 1/2, where sn is the number of nonzero elements, pn is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λn goes to 0 have been made explicit and compared under different penalties. As a result, for the L1-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: s ′ n = O(pn) at most, among O(p2 n) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where s ′ n is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.
(Show Context)

Citation Context

.... [2] developed consistency theories on banding methods for longitudinal data, both for Σ and Ω. Penalized likelihood methods are used by various authors to achieve parsimony on covariance selection. =-=[10]-=- has laid down a general framework for penalized likelihood with diverging dimensionality, with general conditions for oracle property stated and proved. However, it is not clear whether it is applica...

ON THE “DEGREES OF FREEDOM” OF THE LASSO

by Hui Zou, Trevor Hastie, Robert Tibshirani , 2007
"... ..."
Abstract - Cited by 100 (3 self) - Add to MetaCart
Abstract not found

Asymptotic properties of bridge estimators in sparse high-dimensional regression models

by Jian Huang, Joel L. Horowitz, Shuangge Ma - Ann. Statist , 2007
"... We study the asymptotic properties of bridge estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficien ..."
Abstract - Cited by 99 (10 self) - Add to MetaCart
We study the asymptotic properties of bridge estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficients are zero and covariates whose coefficients are nonzero. We show that under appropriate conditions, bridge estimators correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, bridge estimators have an oracle property in the sense of Fan and Li [J. Amer. Statist. Assoc. 96 (2001) 1348–1360] and Fan and Peng [Ann. Statist. 32 (2004) 928–961]. In general, the oracle property holds only if the number of covariates is smaller than the sample size. However, under a partial orthogonality condition in which the covariates of the zero coefficients are uncorrelated or weakly correlated with the covariates of nonzero coefficients, we show that marginal bridge estimators can correctly distinguish between covariates with nonzero and zero coefficients with probability converging to one even when the number of covariates is greater than the sample size.
(Show Context)

Citation Context

...β ′ 20) ′ . Let Kn = {1, . . . , kn} and Jn = {kn + 1, . . . , pn} be the set of indexes of nonzero and zero coefficients, respectively. Let ξnj = n −1 E � n� i=1 Yixij � = n −1 j=1 n� (w ′ iβ10)xij. =-=(5)-=- which is the “covariance” between the jth covariate and the response variable. With the centering and standardization given in (2), ξnj/σ is the correlation coefficient. (B1) (a) εi, ε2, . . . are in...

Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica,

by Jian Huang , Shuangge Ma , Cun-Hui Zhang , 2008
"... Abstract: We study the asymptotic properties of the adaptive Lasso estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive Lasso, where the L1 norms in the penalty are re-weig ..."
Abstract - Cited by 98 (11 self) - Add to MetaCart
Abstract: We study the asymptotic properties of the adaptive Lasso estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive Lasso, where the L1 norms in the penalty are re-weighted by data-dependent weights. We show that, if a reasonable initial estimator is available, under appropriate conditions, the adaptive Lasso correctly selects covariates with nonzero coefficients with probability converging to one, and that the estimators of nonzero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance. Thus, the adaptive Lasso has an oracle property in the sense of Fan and Li
(Show Context)

Citation Context

... for 1 < d ≤ 2, (b) for d = 1, (log kn) 1/d √ n bn1 √ n(log mn) 1/d λnrn (log n)(log kn) √ n bn1 → 0, and λnkn nb 2 n1 → 0, and k 2 n rnbn1 → 0, and λnkn nb 2 n1 √ n(log n)(log mn) → 0, and λnrn → 0, =-=(8)-=- k 2 n rnbn1 → 0. (9) → 0, (10) → 0. (11) (A4) There exist constants 0 < τ1 < τ2 < ∞ such that τ1 ≤ τ1n ≤ τ2n ≤ τ2 for all n; Condition (A1a) is standard in linear regression. Condition (A1b) allows a...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University