Results 1  10
of
65
2010): “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain,” Arxiv Working Paper
"... Abstract. We develop results for the use of Lasso and PostLasso methods to form firststage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments, p. Our results apply even when p is much larger than the sample size, n. We show that the IV e ..."
Abstract

Cited by 55 (19 self)
 Add to MetaCart
Abstract. We develop results for the use of Lasso and PostLasso methods to form firststage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments, p. Our results apply even when p is much larger than the sample size, n. We show that the IV estimator based on using Lasso or PostLasso in the first stage is rootn consistent and asymptotically normal when the firststage is approximately sparse; i.e. when the conditional expectation of the endogenous variables given the instruments can be wellapproximated by a relatively small set of variables whose identities may be unknown. We also show the estimator is semiparametrically efficient when the structural error is homoscedastic. Notably our results allow for imperfect model selection, and do not rely upon the unrealistic ”betamin ” conditions that are widely used to establish validity of inference following model selection. In simulation experiments, the Lassobased IV estimator with a datadriven penalty performs well compared to recently advocated manyinstrumentrobust procedures. In an empirical example dealing with the effect of judicial eminent domain decisions on economic outcomes, the Lassobased IV estimator outperforms an intuitive benchmark. Optimal instruments are conditional expectations. In developing the IV results, we estab
INFERENCE ON TREATMENT EFFECTS AFTER SELECTION AMONGST HIGHDIMENSIONAL CONTROLS
"... We propose robust methods for inference on the effect of a treatment variable on a scalar outcome in the presence of very many controls. Our setting is a partially linear model with possibly nonGaussian and heteroscedastic disturbances. Our analysis allows the number of controls to be much larger ..."
Abstract

Cited by 30 (6 self)
 Add to MetaCart
We propose robust methods for inference on the effect of a treatment variable on a scalar outcome in the presence of very many controls. Our setting is a partially linear model with possibly nonGaussian and heteroscedastic disturbances. Our analysis allows the number of controls to be much larger than the sample size. To make informative inference feasible, we require the model to be approximately sparse; that is, we require that the effect of confounding factors can be controlled for up to a small approximation error by conditioning on a relatively small number of controls whose identities are unknown. The latter condition makes it possible to estimate the treatment effect by selecting approximately the right set of controls. We develop a novel estimation and uniformly valid inference method for the treatment effect in this setting, called the “postdoubleselection ” method. Our results apply to Lassotype methods used for covariate selection as well as to any other model selection method that is able to find a sparse model with good approximation properties. The main attractive feature of our method is that it allows for imperfect selection of the controls and provides confidence intervals that are valid uniformly across a large class of models. In contrast, standard postmodel selection estimators fail to provide uniform inference even in simple cases with a small, fixed number of controls. Thus our method resolves the longstanding problem of uniform inference after model selection for a large, interesting class of models. We illustrate the use of the developed methods with numerical simulations and an application to the effect of abortion on crime rates.
Taking Advantage of Sparsity in MultiTask Learning
"... We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multitask learning [1], we assume that the sparsity patterns of the regression vectors are included in the same set of small cardinality. This ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multitask learning [1], we assume that the sparsity patterns of the regression vectors are included in the same set of small cardinality. This assumption leads us to consider the Group Lasso as a candidate estimation method. We show that this estimator enjoys nice sparsity oracle inequalities and variable selection properties. The results hold under a certain restricted eigenvalue condition and a coherence condition on the design matrix, which naturally extend recent work in [3, 19]. In particular, in the multitask learning scenario, in which the number of tasks can grow, we are able to remove completely the effect of the number of predictor variables in the bounds. Finally, we show how our results can be extended to more general noise distributions, of which we only require the variance to be finite. 1 1
2013): Program Evaluation with HighDimensional Data,Working paper
"... Abstract. In the first part of the paper, we consider estimation and inference on policy relevant treatment effects, such as local average and quantile treatment effects, in a datarich environment where there may be many more control variables available than there are observations. In addition to a ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Abstract. In the first part of the paper, we consider estimation and inference on policy relevant treatment effects, such as local average and quantile treatment effects, in a datarich environment where there may be many more control variables available than there are observations. In addition to allowing many control variables, the setting we consider allows endogenous receipt of treatment, heterogeneous treatment effects, and functionvalued outcomes. To make informative inference possible, we assume that some reduced form predictive relationships are approximately sparse. That is, we require that the relationship between the control variables and the outcome, treatment status, and instrument status can be captured up to a small approximation error using a small number of the control variables whose identities are unknown to the researcher. This condition allows estimation and inference for a wide variety of treatment parameters to proceed after datadriven selection of control variables. We provide conditions under which postselection inference is uniformly valid across a widerange of models and show that a key condition underlying the uniform validity of postselection inference allowing for imperfect model selection is the use of approximately unbiased estimating equations. We illustrate the use of the proposed methods with an application to estimating the effect of 401(k) participation on accumulated assets.
HighDimensional Feature Selection by FeatureWise Kernelized Lasso
, 2013
"... The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and outpu ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a featurewise kernelized Lasso for capturing nonlinear inputoutput dependency. We first show that, with particular choices of kernel functions, nonredundant features with strong statistical dependence on output values can be found in terms of kernelbased independence measures such as the HilbertSchmidt independence criterion (HSIC). We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to highdimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments for classification and regression with thousands of features.
Group lasso for high dimensional sparse quantile regression models
, 2011
"... iv ..."
(Show Context)
Spikeandslab priors for function selection in structured additive regression models
 Journal of the American Statistical Association
"... Structured additive regression provides a general framework for complex Gaussian and nonGaussian regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further regression terms. The large fl ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Structured additive regression provides a general framework for complex Gaussian and nonGaussian regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further regression terms. The large flexibility of structured additive regression makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor and (3) determining the required interactions. We propose a spikeandslab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (real) benchmark classification data, we investigate sensitivity to hyperparameter settings and compare performance to competitors. The flexibility and applicability of our approach are demonstrated in an additive piecewise exponential model with timevarying effects for rightcensored survival times of intensive care patients with sepsis. Geoadditive and additive mixed logit model applications are discussed in an extensive appendix. Keywords: parameter expansion, penalized splines, stochastic search variable selection, generalized additive mixed models, spatial regression 1.
Fast Bayesian Model Assessment for Nonparametric Additive Regression
, 2013
"... Variable selection techniques for the classical linear regression model have been widely investigated. Variable selection in fully nonparametric and additive regression models have been studied more recently. A Bayesian approach for nonparametric additive regression models is considered, where the f ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Variable selection techniques for the classical linear regression model have been widely investigated. Variable selection in fully nonparametric and additive regression models have been studied more recently. A Bayesian approach for nonparametric additive regression models is considered, where the functions in the additive model are expanded in a Bspline basis and a multivariate Laplace prior is put on the coefficients. Posterior probabilities of models defined by selection of predictors in the working model are computed, using a Laplace approximation method. The prior times the likelihood is expanded around the posterior mode, which can be identified with the group LASSO, for which a fast computing algorithm exists. Thus Markov chain MonteCarlo or any other time consuming sampling based methods are completely avoided, leading to quick assessment of various posterior model probabilities. This technique is applied to the highdimensional situation where the number of parameters exceeds the number of observations.
Variable Selection in Highdimensional Varyingcoefficient Models with Global Optimality
, 2012
"... The varyingcoefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. It is important to identify significant covariates associated with response variables, especially for highdimensional settings where the number of covariates can be larger than the sa ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
The varyingcoefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. It is important to identify significant covariates associated with response variables, especially for highdimensional settings where the number of covariates can be larger than the sample size. We consider model selection in the highdimensional setting and adopt difference convex programming to approximate the L0 penalty, and we investigate the global optimality properties of the varyingcoefficient estimator. The challenge of the variable selection problem here is that the dimension of the nonparametric form for the varyingcoefficient modeling could be infinite, in addition to dealing with the highdimensional linear covariates. We show that the proposed varyingcoefficient estimator is consistent, enjoys the oracle property and achieves an optimal convergence rate for the nonzero nonparametric components for highdimensional data. Our simulations and numerical examples indicate that the difference convex algorithm is efficient using the coordinate decent algorithm, and is able to select the true model at a higher frequency than the least absolute shrinkage and selection operator (LASSO), the adaptive LASSO and the smoothly clipped absolute deviation (SCAD) approaches.
Ultrahigh Dimensional Feature Screening via RKHS Embeddings
"... Feature screening is a key step in handling ultrahigh dimensional data sets that are ubiquitous in modern statistical problems. Over the last decade, convex relaxation based approaches (e.g., Lasso/sparse additive model) have been extensively developed and analyzed for feature selection in high dime ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Feature screening is a key step in handling ultrahigh dimensional data sets that are ubiquitous in modern statistical problems. Over the last decade, convex relaxation based approaches (e.g., Lasso/sparse additive model) have been extensively developed and analyzed for feature selection in high dimensional regime. But in the ultrahigh dimensional regime, these approaches suffer from several problems, both computationally and statistically. To overcome these issues, in this paper, we propose a novel Hilbert space embedding based approach to independence screening for ultrahigh dimensional data sets. The proposed approach is modelfree (i.e., no model assumption is made between response and predictors) and could handle nonstandard (e.g., graphs) and multivariate outputs directly. We establish the sure screening property of the proposed approach in the ultrahigh dimensional regime, and experimentally demonstrate its advantages and superiority over other approaches on several synthetic and real data sets. 1