Results 1  10
of
54
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Minimax rates of estimation for highdimensional linear regression over balls
, 2009
"... Abstract—Consider the highdimensional linear regression model,where is an observation vector, is a design matrix with, is an unknown regression vector, and is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating in eitherloss andprediction loss, assuming tha ..."
Abstract

Cited by 97 (19 self)
 Add to MetaCart
(Show Context)
Abstract—Consider the highdimensional linear regression model,where is an observation vector, is a design matrix with, is an unknown regression vector, and is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating in eitherloss andprediction loss, assuming that belongs to anball for some.Itisshown that under suitable regularity conditions on the design matrix, the minimax optimal rate inloss andprediction loss scales as. The analysis in this paper reveals that conditions on the design matrix enter into the rates forerror andprediction error in complementary ways in the upper and lower bounds. Our proofs of the lower bounds are information theoretic in nature, based on Fano’s inequality and results on the metric entropy of the balls, whereas our proofs of the upper bounds are constructive, involving direct analysis of least squares overballs. For the special case, corresponding to models with an exact sparsity constraint, our results show that although computationally efficientbased methods can achieve the minimax rates up to constant factors, they require slightly stronger assumptions on the design matrix than optimal algorithms involving leastsquares over theball. Index Terms—Compressed sensing, minimax techniques, regression analysis. I.
Highdimensional regression with noisy and missing data: Provable guarantees with nonconvexity
, 2011
"... Although the standard formulations of prediction problems involve fullyobserved and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of highdimensional sparse linear regression, and ..."
Abstract

Cited by 75 (10 self)
 Add to MetaCart
(Show Context)
Although the standard formulations of prediction problems involve fullyobserved and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of highdimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and prove that a simple projected gradient descent algorithm will converge in polynomial time to a small neighborhood of the set of global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing, and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm will converge at geometric rates to a nearglobal minimizer. We illustrate these theoretical predictions with simulations, showing agreement with the predicted scalings. 1
A Power Efficient Sensing/Communication Scheme: Joint SourceChannelNetwork Coding by Using Compressive Sensing
"... Abstract—We propose a joint sourcechannelnetwork coding scheme, based on compressive sensing principles, for wireless networks with AWGN channels (that may include multiple access and broadcast), with sources exhibiting temporal and spatial dependencies. Our goal is to provide a reconstruction of ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We propose a joint sourcechannelnetwork coding scheme, based on compressive sensing principles, for wireless networks with AWGN channels (that may include multiple access and broadcast), with sources exhibiting temporal and spatial dependencies. Our goal is to provide a reconstruction of sources within an allowed distortion level at each receiver. We perform joint sourcechannel coding at each source by randomly projecting source values to a lower dimensional space. We consider sources that satisfy the restricted eigenvalue (RE) condition as well as more general sources for which the randomness of the network allows a mapping to lower dimensional spaces. Our approach relies on using analog random linear network coding. The receiver uses compressive sensing decoders to reconstruct sources. Our key insight is the fact that, compressive sensing and analog network coding both preserve the source characteristics required for compressive sensing decoding. I.
Hypothesis testing in highdimensional regression under the gaussian random design model: Asymptotic theory. arXiv
, 2013
"... We consider linear regression in the highdimensional regime in which the number of observations n is smaller than the number of parameters p. A very successful approach in this setting uses ℓ1penalized least squares (a.k.a. the Lasso) to search for a subset of s0 < n parameters that best explai ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
We consider linear regression in the highdimensional regime in which the number of observations n is smaller than the number of parameters p. A very successful approach in this setting uses ℓ1penalized least squares (a.k.a. the Lasso) to search for a subset of s0 < n parameters that best explain the data, while setting the other parameters to zero. A considerable amount of work has been devoted to characterizing the estimation and model selection problems within this approach. In this paper we consider instead the fundamental, but far less understood, question of statistical significance. We study this problem under the random design model in which the rows of the design matrix are i.i.d. and drawn from a highdimensional Gaussian distribution. This situation arises, for instance, in learning highdimensional Gaussian graphical models. Leveraging on an asymptotic distributional characterization of regularized least squares estimators, we develop a procedure for computing pvalues and hence assessing statistical significance for hypothesis testing. We characterize the statistical power of this procedure, and evaluate it on synthetic and real data, comparing it with earlier proposals. Finally, we provide an upper bound on the minimax power of tests with a given significance level and show that our proposed procedure achieves this bound in case of design matrices with i.i.d. Gaussian entries. 1
Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. arXiv preprint, arXiv
, 2013
"... We provide theoretical analysis of the statistical and computational properties of penalized Mestimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regulariz ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
We provide theoretical analysis of the statistical and computational properties of penalized Mestimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization, and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regularization path following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence of any local solution obtained by the algorithm. Computationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all firstorder algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In particular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty. 1
Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations
, 2013
"... This paper concerns robust inference on average treatment effects following model selection. In the selection on observables framework, we show how to construct confidence intervals based on a doublyrobust estimator that are robust to model selection errors and prove that they are valid uniformly o ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
This paper concerns robust inference on average treatment effects following model selection. In the selection on observables framework, we show how to construct confidence intervals based on a doublyrobust estimator that are robust to model selection errors and prove that they are valid uniformly over a large class of treatment effect models. The class allows for multivalued treatments with heterogeneous effects (in observables), general heteroskedasticity, and selection amongst (possibly) more covariates than observations. Our estimator attains the semiparametric efficiency bound under appropriate conditions. Precise conditions are given for any model selector to yield these results, and we show how to combine datadriven selection with economic theory. For implementation, we give a specific proposal for selection based on the group lasso and derive new technical results for highdimensional, sparse multinomial logistic regression. A simulation study shows our estimator performs very well in finite samples over a wide range of models. Revisiting the National Supported Work demonstration data, our method yields accurate estimates and tight confidence intervals.