Results 1  10
of
36
Highdimensional regression with noisy and missing data: Provable guarantees with nonconvexity
, 2011
"... Although the standard formulations of prediction problems involve fullyobserved and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of highdimensional sparse linear regression, and ..."
Abstract

Cited by 75 (10 self)
 Add to MetaCart
(Show Context)
Although the standard formulations of prediction problems involve fullyobserved and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of highdimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and prove that a simple projected gradient descent algorithm will converge in polynomial time to a small neighborhood of the set of global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing, and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm will converge at geometric rates to a nearglobal minimizer. We illustrate these theoretical predictions with simulations, showing agreement with the predicted scalings. 1
Phase Retrieval via Wirtinger Flow: Theory and Algorithms
, 2014
"... We study the problem of recovering the phase from magnitude measurements; specifically, we wish to reconstruct a complexvalued signal x ∈ Cn about which we have phaseless samples of the form yr = ∣⟨ar,x⟩∣2, r = 1,...,m (knowledge of the phase of these samples would yield a linear system). This pape ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
We study the problem of recovering the phase from magnitude measurements; specifically, we wish to reconstruct a complexvalued signal x ∈ Cn about which we have phaseless samples of the form yr = ∣⟨ar,x⟩∣2, r = 1,...,m (knowledge of the phase of these samples would yield a linear system). This paper develops a nonconvex formulation of the phase retrieval problem as well as a concrete solution algorithm. In a nutshell, this algorithm starts with a careful initialization obtained by means of a spectral method, and then refines this initial estimate by iteratively applying novel update rules, which have low computational complexity, much like in a gradient descent scheme. The main contribution is that this algorithm is shown to rigorously allow the exact retrieval of phase information from a nearly minimal number of random measurements. Indeed, the sequence of successive iterates provably converges to the solution at a geometric rate so that the proposed scheme is efficient both in terms of computational and data resources. In theory, a variation on this scheme leads to a nearlinear time algorithm for a physically realizable model based on coded diffraction patterns. We illustrate the effectiveness of our methods with various experiments on image data. Underlying our analysis are insights for the analysis of nonconvex optimization schemes that may have implications for computational problems beyond phase retrieval.
Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
, 2012
"... We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects th ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a nonGaussian distribution. The proof exploits a combination of ideas from the geometry of exponential families, junction tree theory, and convex analysis. These populationlevel results have various consequences for graph selection methods, both known and novel, including a novel method for structure estimation for missing or corrupted observations. We provide nonasymptotic guarantees for such methods, and illustrate the sharpness of these predictions via simulations.
A proximalgradient homotopy method for the sparse leastsquares problem
 SIAM Journal on Optimization
, 2013
"... Abstract We consider solving the 1 regularized leastsquares ( 1 LS) problem in the context of sparse recovery, for applications such as compressed sensing. The standard proximal gradient method, also known as iterative softthresholding when applied to this problem, has low computational cost pe ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Abstract We consider solving the 1 regularized leastsquares ( 1 LS) problem in the context of sparse recovery, for applications such as compressed sensing. The standard proximal gradient method, also known as iterative softthresholding when applied to this problem, has low computational cost per iteration but a rather slow convergence rate. Nevertheless, when the solution is sparse, it often exhibits fast linear convergence in the final stage. We exploit the local linear convergence using a homotopy continuation strategy, i.e., we solve the 1 LS problem for a sequence of decreasing values of the regularization parameter, and use an approximate solution at the end of each stage to warm start the next stage. Although similar strategies have been studied in the literature, there have been no theoretical analysis of their global iteration complexity. This paper shows that under suitable assumptions for sparse recovery, the proposed homotopy strategy ensures that all iterates along the homotopy solution path are sparse. Therefore the objective function is effectively strongly convex along the solution path, and geometric convergence at each stage can be established. As a result, the overall iteration complexity of our method is O(log(1/ )) for finding an optimal solution, which can be interpreted as global geometric rate of convergence. We also present empirical results to support our theoretical analysis.
Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. arXiv preprint, arXiv
, 2013
"... We provide theoretical analysis of the statistical and computational properties of penalized Mestimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regulariz ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
We provide theoretical analysis of the statistical and computational properties of penalized Mestimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization, and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regularization path following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence of any local solution obtained by the algorithm. Computationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all firstorder algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In particular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty. 1
A novel mestimator for robust PCA
"... We study the basic problem of robust subspace recovery. That is, we assume a data set that some of its points are sampled around a fixed subspace and the rest of them are spread in the whole ambient space, and we aim to recover the fixed underlying subspace. We first estimate “robust inverse sample ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We study the basic problem of robust subspace recovery. That is, we assume a data set that some of its points are sampled around a fixed subspace and the rest of them are spread in the whole ambient space, and we aim to recover the fixed underlying subspace. We first estimate “robust inverse sample covariance ” by solving a convex minimization procedure; we then recover the subspace by the bottom eigenvectors of this matrix (their number correspond to the number of eigenvalues close to 0). We guarantee exact subspace recovery under some conditions on the underlying data. Furthermore, we propose a fast iterative algorithm, which linearly converges to the matrix minimizing the convex problem. We also quantify the effect of noise and regularization and discuss many other practical and theoretical issues for improving the subspace recovery in various settings. When replacing the sum of terms in the convex energy function (that we minimize) with the sum of squares of terms, we obtain that the new minimizer is a scaled version of the inverse sample covariance (when exists). We thus interpret our minimizer and its subspace (spanned by its bottom eigenvectors) as robust versions of the empirical inverse covariance and the PCA subspace respectively. We compare our method with many other algorithms for robust PCA on synthetic and real data sets and demonstrate stateoftheart speed and accuracy.
Augmented ℓ1 and nuclearnorm models with a globally linearly convergent algorithm. Rice University CAAM
, 2012
"... This paper studies the longexisting idea of adding a nice smooth function to “smooth ” a nondifferentiable objective function in the context of sparse optimization, in particular, the minimization of ‖x‖1 + 1 2α ‖x‖22, where x is a vector, as well as the minimization of ‖X‖ ∗ + 1 2α ‖X‖2F, where X ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
This paper studies the longexisting idea of adding a nice smooth function to “smooth ” a nondifferentiable objective function in the context of sparse optimization, in particular, the minimization of ‖x‖1 + 1 2α ‖x‖22, where x is a vector, as well as the minimization of ‖X‖ ∗ + 1 2α ‖X‖2F, where X is a matrix and ‖X‖ ∗ and ‖X‖F are the nuclear and Frobenius norms of X, respectively. We show that they let sparse vectors and lowrank matrices be efficiently recovered. In particular, they enjoy exact and stable recovery guarantees similar to those known for the minimization of ‖x‖1 and ‖X‖ ∗ under the conditions on the sensing operator such as its nullspace property, restricted isometry property, spherical section property, or “RIPless ” property. To recover a (nearly) sparse vector x 0, minimizing ‖x‖1+ 1 2α ‖x‖22 returns (nearly) the same solution as minimizing ‖x‖1 whenever α ≥ 10‖x 0 ‖∞. The same relation also holds between minimizing ‖X‖ ∗ + 1 2α ‖X‖2F and minimizing ‖X‖ ∗ for recovering a (nearly) lowrank matrix X 0 if α ≥ 10‖X 0 ‖2. Furthermore, we show that the linearized Bregman algorithm, as well as its two fast variants, for minimizing ‖x‖1 + 1 2α ‖x‖2 2 subject to Ax = b enjoys global linear convergence as long as a nonzero solution exists, and we give an explicit rate of convergence. The convergence property does not require a sparse solution or any properties on A. To our knowledge, this is the best known global convergence result for firstorder sparse optimization algorithms. 1
Challenges of big data analysis
 National Science Review
, 2014
"... Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with smallscale data. On the other hand, the massive sample size and high dimensional ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with smallscale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in highconfidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.
Proximal Markov chain Monte Carlo algorithms’, arXiv:1306.0187
, 2013
"... This paper presents two new Langevin Markov chain Monte Carlo methods that use convex analysis to simulate efficiently from highdimensional densities that are logconcave, a class of probability distributions that is widely used in modern highdimensional statistics and data analysis. The methods ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
This paper presents two new Langevin Markov chain Monte Carlo methods that use convex analysis to simulate efficiently from highdimensional densities that are logconcave, a class of probability distributions that is widely used in modern highdimensional statistics and data analysis. The methods are based on a new firstorder approximation for Langevin diffusions that exploits logconcavity to construct Markov chains with favourable convergence properties. This approximation is closely related to MoreauYoshida regularisations for convex functions and uses proximity mappings instead of gradient mappings to approximate the continuoustime process. The first method presented in this paper is an unadjusted Langevin algorithm for logconcave distributions. The method is shown to be stable and geometrically ergodic in many cases for which the conventional unadjusted Langevin algorithm is transient or explosive. The second method is a new Metropolis adjusted Langevin algorithm that complements the unadjusted algorithm with a MetropolisHastings step guaranteeing convergence to the desired target density. It is shown that this method inherits the convergence properties of the unadjusted algorithm and is geometrically ergodic in many cases for which the conventional Metropolis adjusted Langevin algorithm is not. The proposed methodology is demonstrated on two challenging highdimensional signal processing applications related to audio compressive sensing and image resolution enhancement.