Results 1 - 10
of
12
Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. arXiv preprint, arXiv
, 2013
"... We provide theoretical analysis of the statistical and computational properties of penalized M-estimators that can be formulated as the solution to a possibly nonconvex optimization prob-lem. Many important estimators fall in this category, including least squares regression with nonconvex regulariz ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
(Show Context)
We provide theoretical analysis of the statistical and computational properties of penalized M-estimators that can be formulated as the solution to a possibly nonconvex optimization prob-lem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization, and sparse elliptical random design regression. For these problems, it is intractable to calculate the global solution due to the nonconvex formulation. In this paper, we propose an approximate regulariza-tion path following method for solving a variety of learning problems with nonconvex objective functions. Under a unified analytic framework, we simultaneously provide explicit statistical and computational rates of convergence of any local solution obtained by the algorithm. Com-putationally, our algorithm attains a global geometric rate of convergence for calculating the full regularization path, which is optimal among all first-order algorithms. Unlike most existing methods that only attain geometric rates of convergence for one single regularization parameter, our algorithm calculates the full regularization path with the same iteration complexity. In par-ticular, we provide a refined iteration complexity bound to sharply characterize the performance of each stage along the regularization path. Statistically, we provide sharp sample complexity analysis for all the approximate local solutions along the regularization path. In particular, our analysis improves upon existing results by providing a more refined sample complexity bound as well as an exact support recovery result for the final estimator. These results show that the final estimator attains an oracle statistical property due to the usage of nonconvex penalty. 1
Sharp time-data tradeoffs for linear inverse problems
- In preparation
, 2015
"... In this paper we characterize sharp time-data tradeoffs for optimization problems used for solving linear inverse problems. We focus on the minimization of a least-squares objective subject to a constraint defined as the sub-level set of a penalty function. We present a unified convergence analysis ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
In this paper we characterize sharp time-data tradeoffs for optimization problems used for solving linear inverse problems. We focus on the minimization of a least-squares objective subject to a constraint defined as the sub-level set of a penalty function. We present a unified convergence analysis of the gradient projection algorithm applied to such problems. We sharply characterize the convergence rate associated with a wide variety of random measurement ensembles in terms of the number of measurements and structural complexity of the signal with respect to the chosen penalty function. The results apply to both convex and nonconvex constraints, demonstrating that a linear convergence rate is attainable even though the least squares objective is not strongly convex in these settings. When specialized to Gaussian measurements our results show that such linear convergence occurs when the number of measurements is merely 4 times the minimal number required to recover the desired signal at all (a.k.a. the phase transition). We also achieve a slower but geometric rate of convergence precisely above the phase transition point. Extensive numerical results suggest that the derived rates exactly match the empirical performance.
Forward-backward truncated Newton methods for convex composite optimization
, 2014
"... ..."
(Show Context)
An Adaptive Accelerated Proximal Gradient Method and its Homotopy Continuation for Sparse Optimization
"... We first propose an adaptive accelerated prox-imal gradient (APG) method for minimizing strongly convex composite functions with un-known convexity parameters. This method in-corporates a restarting scheme to automatical-ly estimate the strong convexity parameter and achieves a nearly optimal iterat ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We first propose an adaptive accelerated prox-imal gradient (APG) method for minimizing strongly convex composite functions with un-known convexity parameters. This method in-corporates a restarting scheme to automatical-ly estimate the strong convexity parameter and achieves a nearly optimal iteration complexi-ty. Then we consider the ℓ1-regularized least-squares (ℓ1-LS) problem in the high-dimensional setting. Although such an objective function is not strongly convex, it has restricted strong convexity over sparse vectors. We exploit this property by combining the adaptive APG method with a homotopy continuation scheme, which generates a sparse solution path towards optimal-ity. This method obtains a global linear rate of convergence and its overall iteration complexity has a weaker dependency on the restricted condi-tion number than previous work. 1.
1,p– Norm Regularization: Error Bounds and Convergence Rate Analysis of First–Order Methods.
- In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015),
, 2015
"... Abstract In recent years, the 1,p -regularizer has been widely used to induce structured sparsity in the solutions to various optimization problems. Currently, such 1,p -regularized problems are typically solved by first-order methods. Motivated by the desire to analyze the convergence rates of the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract In recent years, the 1,p -regularizer has been widely used to induce structured sparsity in the solutions to various optimization problems. Currently, such 1,p -regularized problems are typically solved by first-order methods. Motivated by the desire to analyze the convergence rates of these methods, we show that for a large class of 1,p -regularized problems, an error bound condition is satisfied when p ∈ [1, 2] or p = ∞ but fails to hold for any p ∈ (2, ∞). Based on this result, we show that many first-order methods enjoy an asymptotic linear rate of convergence when applied to 1,p -regularized linear or logistic regression with p ∈ [1, 2] or p = ∞. By contrast, numerical experiments suggest that for the same class of problems with p ∈ (2, ∞), the aforementioned methods may not converge linearly.
Systèmes, département SSE), and the Center for Learning and Visual Computing
, 2013
"... Il ne tarde pas à prouver qu’il se suffit à lui-même. ..."
Theory of Dual-sparse Regularized Randomized Reduction
"... In this paper, we study randomized reduction methods, which reduce high-dimensional fea-tures into low-dimensional space by randomized methods (e.g., random projection, random hash-ing), for large-scale high-dimensional classifica-tion. Previous theoretical results on randomized reduction methods hi ..."
Abstract
- Add to MetaCart
In this paper, we study randomized reduction methods, which reduce high-dimensional fea-tures into low-dimensional space by randomized methods (e.g., random projection, random hash-ing), for large-scale high-dimensional classifica-tion. Previous theoretical results on randomized reduction methods hinge on strong assumptions about the data, e.g., low rank of the data matrix or a large separable margin of classification, which hinder their applications in broad domains. To address these limitations, we propose dual-sparse regularized randomized reduction methods that introduce a sparse regularizer into the reduced dual problem. Under a mild condition that the original dual solution is a (nearly) sparse vec-tor, we show that the resulting dual solution is close to the original dual solution and concen-trates on its support set. In numerical experi-ments, we present an empirical study to support the analysis and we also present a novel appli-cation of the dual-sparse regularized randomized reduction methods to reducing the communica-tion cost of distributed learning from large-scale high-dimensional data. 1.
Generalized Conjugate Gradient Methods for `1 Regularized Convex Quadratic Programming with Finite Convergence
, 2015
"... The conjugate gradient (CG) method is an efficient iterative method for solving large-scale strongly convex quadratic programming (QP). In this paper we propose some generalized CG (GCG) methods for solving the `1-regularized (possibly not strongly) convex QP that terminate at an optimal solution in ..."
Abstract
- Add to MetaCart
The conjugate gradient (CG) method is an efficient iterative method for solving large-scale strongly convex quadratic programming (QP). In this paper we propose some generalized CG (GCG) methods for solving the `1-regularized (possibly not strongly) convex QP that terminate at an optimal solution in a finite number of iterations. At each iteration, our methods first identify a face of an orthant and then either perform an exact line search along the direction of the negative projected minimum-norm subgradient of the objective function or execute a CG subroutine that conducts a sequence of CG iterations until a CG iterate crosses the boundary of this face or an approximate minimizer of over this face or a subface is found. We determine which type of step should be taken by comparing the magnitude of some components of the minimum-norm subgradient of the objective function to that of its rest components. Our analysis on finite convergence of these methods makes use of an error bound result and some key properties of the aforementioned exact line search and the CG subroutine. We also show that the proposed methods are capable of finding an approximate solution of the problem by allowing some inexactness on the execution of the CG subroutine. The overall arithmetic operation cost of our GCG methods for finding an -optimal solution depends on in O(log(1/)), which is superior to the accelerated proximal gradient method [2, 23] that depends on in O(1/ ). In addition, our GCG methods can be extended straightforwardly to solve box-constrained convex QP with finite convergence. Numerical results demonstrate that our methods are very favorable for solving ill-conditioned problems.
Subset Selection by Pareto Optimization
"... Selecting the optimal subset from a large set of variables is a fundamental problem in various learning tasks such as feature selection, sparse regression, dictionary learning, etc. In this paper, we propose the POSS approach which employs evo-lutionary Pareto optimization to find a small-sized subs ..."
Abstract
- Add to MetaCart
(Show Context)
Selecting the optimal subset from a large set of variables is a fundamental problem in various learning tasks such as feature selection, sparse regression, dictionary learning, etc. In this paper, we propose the POSS approach which employs evo-lutionary Pareto optimization to find a small-sized subset with good performance. We prove that for sparse regression, POSS is able to achieve the best-so-far the-oretically guaranteed approximation performance efficiently. Particularly, for the Exponential Decay subclass, POSS is proven to achieve an optimal solution. Em-pirical study verifies the theoretical results, and exhibits the superior performance of POSS to greedy and convex relaxation methods. 1