Results 1 - 10
of
98
Boosting algorithms: Regularization, prediction and model fitting
- Statistical Science
, 2007
"... Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and correspo ..."
Abstract
-
Cited by 99 (12 self)
- Add to MetaCart
(Show Context)
Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in highdimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated open-source software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing user-specified loss functions. Key words and phrases: Generalized linear models, generalized additive models, gradient boosting, survival analysis, variable selection, software. 1.
Near-ideal model selection by ℓ1 minimization
, 2008
"... We consider the fundamental problem of estimating the mean of a vector y = Xβ + z, where X is an n × p design matrix in which one can have far more variables than observations and z is a stochastic error term—the so-called ‘p> n ’ setup. When β is sparse, or more generally, when there is a sparse ..."
Abstract
-
Cited by 84 (4 self)
- Add to MetaCart
(Show Context)
We consider the fundamental problem of estimating the mean of a vector y = Xβ + z, where X is an n × p design matrix in which one can have far more variables than observations and z is a stochastic error term—the so-called ‘p> n ’ setup. When β is sparse, or more generally, when there is a sparse subset of covariates providing a close approximation to the unknown mean vector, we ask whether or not it is possible to accurately estimate Xβ using a computationally tractable algorithm. We show that in a surprisingly wide range of situations, the lasso happens to nearly select the best subset of variables. Quantitatively speaking, we prove that solving a simple quadratic program achieves a squared error within a logarithmic factor of the ideal mean squared error one would achieve with an oracle supplying perfect information about which variables should be included in the model and which variables should not. Interestingly, our results describe the average performance of the lasso; that is, the performance one can expect in an vast majority of cases where Xβ is a sparse or nearly sparse superposition of variables, but not in all cases. Our results are nonasymptotic and widely applicable since they simply require that pairs of predictor variables are not too collinear.
SparseNet: Coordinate Descent with Non-Convex Penalties
, 2009
"... We address the problem of sparse selection in linear models. A number of non-convex penalties have been proposed for this purpose, along with a variety of convex-relaxation algorithms for finding good solutions. In this paper we pursue the coordinate-descent approach for optimization, and study its ..."
Abstract
-
Cited by 71 (0 self)
- Add to MetaCart
We address the problem of sparse selection in linear models. A number of non-convex penalties have been proposed for this purpose, along with a variety of convex-relaxation algorithms for finding good solutions. In this paper we pursue the coordinate-descent approach for optimization, and study its convergence properties. We characterize the properties of penalties suitable for this approach, study their corresponding threshold functions, and describe a df-standardizing reparametrization that assists our pathwise algorithm. The MC+ penalty (Zhang 2010) is ideally suited to this task, and we use it to demonstrate the performance of our algorithm. 1
VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS
, 2008
"... Summary. We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of non-zero additive components is “small” relative to the sample size. The statistical problem is to determin ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
Summary. We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of non-zero additive components is “small” relative to the sample size. The statistical problem is to determine which additive components are non-zero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model and, the adaptive group Lasso selects the non-zero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. Following model selection, oracle-efficient, asymptotically normal estimators of the non-zero components can be obtained by using existing methods. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method. Key words and phrases. Adaptive group Lasso; component selection; highdimensional data; nonparametric regression; selection consistency. Short title. Nonparametric component selection AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99 1
P-values for high-dimensional regression
, 2009
"... Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
(Show Context)
Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a one-time random split of the data. Results are sensitive to this arbitrary choice: it amounts to a “p-value lottery ” and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. In addition, the proposed aggregation is shown to improve power, while reducing the number of falsely selected variables substantially. Keywords: High-dimensional variable selection, data splitting, multiple comparisons. 1
Shrinkage tuning parameter selection with a diverging number of parameters
, 2008
"... Contemporary statistical research frequently deals with problems involving a diverging number of parameters. For those problems, various shrinkage methods (e.g., LASSO, SCAD, etc) are found particularly useful for the purpose of variable selection (Fan and Peng, 2004; Huang, Ma and Zhang, 2007). Nev ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
Contemporary statistical research frequently deals with problems involving a diverging number of parameters. For those problems, various shrinkage methods (e.g., LASSO, SCAD, etc) are found particularly useful for the purpose of variable selection (Fan and Peng, 2004; Huang, Ma and Zhang, 2007). Nevertheless, the desirable performances of those shrinkage methods heavily hinge on an appropriate selection of the tuning parameters. With a fixed predictor dimension, Wang, Li, and Tsai (2007b) and Wang and Leng (2007) demonstrated that the tuning parameters selected by a BIC-type criterion can identify the true model consistently. In this work, similar results are further extended to the situation with a diverging number of parameters for both unpenalized and penalized estimators. Consequently, our theoretical results further enlarge not only the applicable scope of the traditional
Confidence intervals for low-dimensional parameters with high-dimensional data
- ArXiv.org
"... Abstract. The purpose of this paper is to propose methodologies for statistical inference of low-dimensional parameters with high-dimensional data. We focus on constructing con-fidence intervals for individual coefficients and linear combinations of several of them in a linear regression model, alth ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
Abstract. The purpose of this paper is to propose methodologies for statistical inference of low-dimensional parameters with high-dimensional data. We focus on constructing con-fidence intervals for individual coefficients and linear combinations of several of them in a linear regression model, although our ideas are applicable in a much broader context. The theoretical results presented here provide sufficient conditions for the asymptotic normality of the proposed estimators along with a consistent estimator for their finite-dimensional covariance matrices. These sufficient conditions allow the number of variables to far ex-ceed the sample size. The simulation results presented here demonstrate the accuracy of the coverage probability of the proposed confidence intervals, strongly supporting the theoretical results.
Regularized estimation in the accelerated failure time model with high dimensional covariates.
- Biometrics
, 2006
"... Summary. In the presence of high-dimensional predictors, it is challenging to develop reliable regression models that can be used to accurately predict future outcomes. Further complications arise when the outcome of interest is an event time, which is often not fully observed due to censoring. In ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Summary. In the presence of high-dimensional predictors, it is challenging to develop reliable regression models that can be used to accurately predict future outcomes. Further complications arise when the outcome of interest is an event time, which is often not fully observed due to censoring. In this article, we develop robust prediction models for event time outcomes by regularizing the Gehan's estimator for the accelerated failure time (AFT) model
Lasso and probabilistic inequalities for multivariate point processes
- SUBMITTED TO THE BERNOULLI
"... ..."
(Show Context)