• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Fast Stochastic Alternating Direction Method of Multipliers

by Leon Wenliang Zhong, James T. Kwok
Add To MetaCart

Tools

Sorted by:
Results 1 - 4 of 4

Incremental majorization-minimization optimization with application to large-scale machine learning

by Julien Mairal , 2015
"... Majorization-minimization algorithms consist of successively minimizing a sequence of upper bounds of the objective function. These upper bounds are tight at the current estimate, and each iteration monotonically drives the objective function downhill. Such a simple principle is widely applicable ..."
Abstract - Cited by 23 (1 self) - Add to MetaCart
Majorization-minimization algorithms consist of successively minimizing a sequence of upper bounds of the objective function. These upper bounds are tight at the current estimate, and each iteration monotonically drives the objective function downhill. Such a simple principle is widely applicable and has been very popular in various scientific fields, especially in signal processing and statistics. We propose an incremental majorization-minimization scheme for minimizing a large sum of continuous functions, a problem of utmost importance in machine learning. We present convergence guarantees for nonconvex and convex optimization when the upper bounds approximate the objective up to a smooth error; we call such upper bounds “first-order surrogate functions.” More precisely, we study asymptotic stationary point guarantees for nonconvex problems, and for convex ones, we provide convergence rates for the expected objective function value. We apply our scheme to composite optimization and obtain a new incremental proximal gradient algorithm with linear convergence rate for strongly convex functions. Our experiments show that our method is competitive with the state of the art for solving machine learning problems such as logistic regression when the number of training samples is large enough, and we demonstrate its usefulness for sparse estimation with nonconvex penalties.

Stochastic primal-dual coordinate method for regularized empirical risk minimization.

by Yuchen Zhang , Lin Xiao , 2014
"... Abstract We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convexconcave saddle point problem. We propose a stochastic primal-dual coordinate (SPDC) method, which alt ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
Abstract We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convexconcave saddle point problem. We propose a stochastic primal-dual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variable. An extrapolation step on the primal variable is performed to obtain accelerated convergence rate. We also develop a mini-batch version of the SPDC method which facilitates parallel computing, and an extension with weighted sampling probabilities on the dual variables, which has a better complexity than uniform sampling on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several state-of-the-art optimization methods.
(Show Context)

Citation Context

...ap [18, Section 5.1]. The computational cost of this additional step is equivalent to one pass of the dataset, thus it does not affect the overall complexity. 4.2 Other related work Another way to approach problem (1) is to reformulate it as a constrained optimization problem minimize 1 n n∑ i=1 φi(zi) + g(x) (26) subject to aTi x = zi, , i = 1, . . . , n, 13 and solve it by ADMM type of operator-splitting methods (e.g., [19]). In fact, as shown in [8], the batch primal-dual algorithm (21)-(23) is equivalent to a pre-conditioned ADMM (or inexact Uzawa method; see, e.g., [47]). Several authors [42, 28, 37, 49] have considered a more general formulation than (26), where each φi is a function of the whole vector z ∈ Rn. They proposed online or stochastic versions of ADMM which operate on only one φi in each iteration, and obtained sublinear convergence rates. However, their cost per iteration is O(nd) instead of O(d). Suzuki [38] considered a problem similar to (1), but with more complex regularization function g, meaning that g does not have a simple proximal mapping. Thus primal updates such as step (5) or (9) in SPDC and similar steps in SDCA cannot be computed efficiently. He proposed an algorith...

Linearized Alternating Direction Method of Multipliers for Constrained Nonconvex Regularized Optimization

by Linbo Qiao , Bofeng Zhang , Jinshu Su , Xicheng Lu
"... Abstract In this paper, we consider a wide class of constrained nonconvex regularized minimization problems, where the constraints are linearly constraints. It was reported in the literature that nonconvex regularization usually yields a solution with more desirable sparse structural properties bey ..."
Abstract - Add to MetaCart
Abstract In this paper, we consider a wide class of constrained nonconvex regularized minimization problems, where the constraints are linearly constraints. It was reported in the literature that nonconvex regularization usually yields a solution with more desirable sparse structural properties beyond convex ones. However, it is not easy to obtain the proximal mapping associated with nonconvex regularization, due to the imposed linearly constraints. In this paper, the optimization problem with linear constraints is solved by the Linearized Alternating Direction Method of Multipliers (LADMM). Moreover, we present a detailed convergence analysis of the LADMM algorithm for solving nonconvex compositely regularized optimization with a large class of nonconvex penalties. Experimental results on several real-world datasets validate the efficacy of the proposed algorithm.
(Show Context)

Citation Context

... algorithm (HONOR) which combines the quasi-Newton method and the gradient descent method (Gong and Ye, 2015). However, the MS algorithm does not admit a closed-form solution for graph-guided regularized optimization problems and hence leads to an expensive per-iteration computational cost. When A or B is non-diagonal, neither the SCP algorithm nor the GIST algorithm is efficient for solving problem (1) since the proximal mapping of r(·) is typically not available. Another related stream of works are the ADMM-type algorithms which are suitable to solve problem (1) when A or B is not diagonal (Zhong and Kwok, 2013; Zhang and Kwok, 2014; Wang et al., 2014; Zhao et al., 2015). Such a kind of algorithms have recently been shown effective to handle some nonconvex optimization problems (Magnsson et al., 2014; Jiang et al., 2014; Hong et al., 2015; Yang et al., 2015; Wang et al., 2015a,b; Li and Pong, 2015). However, the results of (Magnsson et al., 2014; Jiang et al., 2014) require a not well-justified assumption about the generated iterates, while some other works focus on certain specific problems such as the consensus and sharing problems (Hong et al., 2015) and the background/foreground extraction probl...

Convolutional Sparse Coding for Image Super-resolution

by Shuhang Gu, Wangmeng Zuo, Qi Xie, Deyu Meng, Xiangchu Feng, Lei Zhang
"... Most of the previous sparse coding (SC) based super res-olution (SR) methods partition the image into overlapped patches, and process each patch separately. These method-s, however, ignore the consistency of pixels in overlapped patches, which is a strong constraint for image reconstruc-tion. In thi ..."
Abstract - Add to MetaCart
Most of the previous sparse coding (SC) based super res-olution (SR) methods partition the image into overlapped patches, and process each patch separately. These method-s, however, ignore the consistency of pixels in overlapped patches, which is a strong constraint for image reconstruc-tion. In this paper, we propose a convolutional sparse cod-ing (CSC) based SR (CSC-SR) method to address the con-sistency issue. Our CSC-SR involves three groups of pa-rameters to be learned: (i) a set of filters to decompose the low resolution (LR) image into LR sparse feature maps; (ii) a mapping function to predict the high resolution (HR) fea-ture maps from the LR ones; and (iii) a set of filters to recon-struct the HR images from the predicted HR feature maps via simple convolution operations. By working directly on the whole image, the proposed CSC-SR algorithm does not need to divide the image into overlapped patches, and can exploit the image global correlation to produce more ro-bust reconstruction of image local structures. Experimental results clearly validate the advantages of CSC over patch based SC in SR application. Compared with state-of-the-art SR methods, the proposed CSC-SR method achieves highly competitive PSNR results, while demonstrating better edge and texture preservation performance. 1.
(Show Context)

Citation Context

...large, the ADMM algorithm suffers from the problem of high memory demand for solving (4). Fortunately, the marriage of the recently developed stochastic average (SA) algorithms and ADMM, i.e., SA-ADMM=-=[34]-=-, can be utilized to optimize (4). Different from standard ADMM, SA-ADMM adopts the linearization technique which can be deployed to avoid the computation of matrix inversion in our case, and utilizes...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University