Results 1  10
of
12
Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. arXiv preprint, arXiv
, 2013
"... We provide theoretical analysis of the statistical and computational properties of penalized Mestimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regulariz ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
We provide theoretical analysis of the statistical and computational properties of penalized Mestimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization, and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regularization path following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence of any local solution obtained by the algorithm. Computationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all firstorder algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In particular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty. 1
Sharp timedata tradeoffs for linear inverse problems
 In preparation
, 2015
"... In this paper we characterize sharp timedata tradeoffs for optimization problems used for solving linear inverse problems. We focus on the minimization of a leastsquares objective subject to a constraint defined as the sublevel set of a penalty function. We present a unified convergence analysis ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
In this paper we characterize sharp timedata tradeoffs for optimization problems used for solving linear inverse problems. We focus on the minimization of a leastsquares objective subject to a constraint defined as the sublevel set of a penalty function. We present a unified convergence analysis of the gradient projection algorithm applied to such problems. We sharply characterize the convergence rate associated with a wide variety of random measurement ensembles in terms of the number of measurements and structural complexity of the signal with respect to the chosen penalty function. The results apply to both convex and nonconvex constraints, demonstrating that a linear convergence rate is attainable even though the least squares objective is not strongly convex in these settings. When specialized to Gaussian measurements our results show that such linear convergence occurs when the number of measurements is merely 4 times the minimal number required to recover the desired signal at all (a.k.a. the phase transition). We also achieve a slower but geometric rate of convergence precisely above the phase transition point. Extensive numerical results suggest that the derived rates exactly match the empirical performance.
Forwardbackward truncated Newton methods for convex composite optimization
, 2014
"... ..."
(Show Context)
An Adaptive Accelerated Proximal Gradient Method and its Homotopy Continuation for Sparse Optimization
"... We first propose an adaptive accelerated proximal gradient (APG) method for minimizing strongly convex composite functions with unknown convexity parameters. This method incorporates a restarting scheme to automatically estimate the strong convexity parameter and achieves a nearly optimal iterat ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We first propose an adaptive accelerated proximal gradient (APG) method for minimizing strongly convex composite functions with unknown convexity parameters. This method incorporates a restarting scheme to automatically estimate the strong convexity parameter and achieves a nearly optimal iteration complexity. Then we consider the ℓ1regularized leastsquares (ℓ1LS) problem in the highdimensional setting. Although such an objective function is not strongly convex, it has restricted strong convexity over sparse vectors. We exploit this property by combining the adaptive APG method with a homotopy continuation scheme, which generates a sparse solution path towards optimality. This method obtains a global linear rate of convergence and its overall iteration complexity has a weaker dependency on the restricted condition number than previous work. 1.
1,p– Norm Regularization: Error Bounds and Convergence Rate Analysis of First–Order Methods.
 In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015),
, 2015
"... Abstract In recent years, the 1,p regularizer has been widely used to induce structured sparsity in the solutions to various optimization problems. Currently, such 1,p regularized problems are typically solved by firstorder methods. Motivated by the desire to analyze the convergence rates of the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract In recent years, the 1,p regularizer has been widely used to induce structured sparsity in the solutions to various optimization problems. Currently, such 1,p regularized problems are typically solved by firstorder methods. Motivated by the desire to analyze the convergence rates of these methods, we show that for a large class of 1,p regularized problems, an error bound condition is satisfied when p ∈ [1, 2] or p = ∞ but fails to hold for any p ∈ (2, ∞). Based on this result, we show that many firstorder methods enjoy an asymptotic linear rate of convergence when applied to 1,p regularized linear or logistic regression with p ∈ [1, 2] or p = ∞. By contrast, numerical experiments suggest that for the same class of problems with p ∈ (2, ∞), the aforementioned methods may not converge linearly.
Systèmes, département SSE), and the Center for Learning and Visual Computing
, 2013
"... Il ne tarde pas à prouver qu’il se suffit à luimême. ..."
Theory of Dualsparse Regularized Randomized Reduction
"... In this paper, we study randomized reduction methods, which reduce highdimensional features into lowdimensional space by randomized methods (e.g., random projection, random hashing), for largescale highdimensional classification. Previous theoretical results on randomized reduction methods hi ..."
Abstract
 Add to MetaCart
In this paper, we study randomized reduction methods, which reduce highdimensional features into lowdimensional space by randomized methods (e.g., random projection, random hashing), for largescale highdimensional classification. Previous theoretical results on randomized reduction methods hinge on strong assumptions about the data, e.g., low rank of the data matrix or a large separable margin of classification, which hinder their applications in broad domains. To address these limitations, we propose dualsparse regularized randomized reduction methods that introduce a sparse regularizer into the reduced dual problem. Under a mild condition that the original dual solution is a (nearly) sparse vector, we show that the resulting dual solution is close to the original dual solution and concentrates on its support set. In numerical experiments, we present an empirical study to support the analysis and we also present a novel application of the dualsparse regularized randomized reduction methods to reducing the communication cost of distributed learning from largescale highdimensional data. 1.
Generalized Conjugate Gradient Methods for `1 Regularized Convex Quadratic Programming with Finite Convergence
, 2015
"... The conjugate gradient (CG) method is an efficient iterative method for solving largescale strongly convex quadratic programming (QP). In this paper we propose some generalized CG (GCG) methods for solving the `1regularized (possibly not strongly) convex QP that terminate at an optimal solution in ..."
Abstract
 Add to MetaCart
The conjugate gradient (CG) method is an efficient iterative method for solving largescale strongly convex quadratic programming (QP). In this paper we propose some generalized CG (GCG) methods for solving the `1regularized (possibly not strongly) convex QP that terminate at an optimal solution in a finite number of iterations. At each iteration, our methods first identify a face of an orthant and then either perform an exact line search along the direction of the negative projected minimumnorm subgradient of the objective function or execute a CG subroutine that conducts a sequence of CG iterations until a CG iterate crosses the boundary of this face or an approximate minimizer of over this face or a subface is found. We determine which type of step should be taken by comparing the magnitude of some components of the minimumnorm subgradient of the objective function to that of its rest components. Our analysis on finite convergence of these methods makes use of an error bound result and some key properties of the aforementioned exact line search and the CG subroutine. We also show that the proposed methods are capable of finding an approximate solution of the problem by allowing some inexactness on the execution of the CG subroutine. The overall arithmetic operation cost of our GCG methods for finding an optimal solution depends on in O(log(1/)), which is superior to the accelerated proximal gradient method [2, 23] that depends on in O(1/ ). In addition, our GCG methods can be extended straightforwardly to solve boxconstrained convex QP with finite convergence. Numerical results demonstrate that our methods are very favorable for solving illconditioned problems.
Subset Selection by Pareto Optimization
"... Selecting the optimal subset from a large set of variables is a fundamental problem in various learning tasks such as feature selection, sparse regression, dictionary learning, etc. In this paper, we propose the POSS approach which employs evolutionary Pareto optimization to find a smallsized subs ..."
Abstract
 Add to MetaCart
(Show Context)
Selecting the optimal subset from a large set of variables is a fundamental problem in various learning tasks such as feature selection, sparse regression, dictionary learning, etc. In this paper, we propose the POSS approach which employs evolutionary Pareto optimization to find a smallsized subset with good performance. We prove that for sparse regression, POSS is able to achieve the bestsofar theoretically guaranteed approximation performance efficiently. Particularly, for the Exponential Decay subclass, POSS is proven to achieve an optimal solution. Empirical study verifies the theoretical results, and exhibits the superior performance of POSS to greedy and convex relaxation methods. 1