Results 1  10
of
32
A stochastic gradient method with an exponential convergence rate for finite training sets.
 In NIPS,
, 2012
"... Abstract We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient ..."
Abstract

Cited by 73 (10 self)
 Add to MetaCart
Abstract We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. Numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms.
Convergence Rates of Inexact ProximalGradient Methods for Convex Optimization
 NIPS'11 25 TH ANNUAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS
, 2011
"... We consider the problem of optimizing the sum of a smooth convex function and a nonsmooth convex function using proximalgradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the nonsmooth term. We show that b ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of optimizing the sum of a smooth convex function and a nonsmooth convex function using proximalgradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the nonsmooth term. We show that both the basic proximalgradient method and the accelerated proximalgradient method achieve the same convergence rate as in the errorfree case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems.
Minimizing Finite Sums with the Stochastic Average Gradient
, 2013
"... We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method’s iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradie ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method’s iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than blackbox SG methods. The convergence rate is improved from O(1 / √ k) to O(1/k) in general, and when the sum is stronglyconvex the convergence rate is improved from the sublinear O(1/k) to a linear convergence rate of the form O(ρ k) for ρ < 1. Further, in many cases the convergence rate of the new method is also faster than blackbox deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of nonuniform sampling strategies. 1
A proximal stochastic gradient method with progressive variance reduction.
 SIAM Journal on Optimization,
, 2014
"... Abstract We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole objective function is strongly convex. Such probl ..."
Abstract

Cited by 30 (6 self)
 Add to MetaCart
(Show Context)
Abstract We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole objective function is strongly convex. Such problems often arise in machine learning, known as regularized empirical risk minimization. We propose and analyze a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastic gradient. While each iteration of this algorithm has similar cost as the classical stochastic gradient method (or incremental gradient method), we show that the expected objective value converges to the optimum at a geometric rate. The overall complexity of this method is much lower than both the proximal full gradient method and the standard proximal stochastic gradient method.
Linear convergence with condition number independent access of full gradients
 Advances in Neural Information Processing Systems
, 2013
"... For smooth and strongly convex optimizations, the optimal iteration complexity of the gradientbased algorithm is O( κ log 1/ǫ), where κ is the condition number. In the case that the optimization problem is illconditioned, we need to evaluate a large number of full gradients, which could be computa ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
For smooth and strongly convex optimizations, the optimal iteration complexity of the gradientbased algorithm is O( κ log 1/ǫ), where κ is the condition number. In the case that the optimization problem is illconditioned, we need to evaluate a large number of full gradients, which could be computationally expensive. In this paper, we propose to remove the dependence on the condition number by allowing the algorithm to access stochastic gradients of the objective function. To this end, we present a novel algorithm named Epoch Mixed Gradient Descent (EMGD) that is able to utilize two kinds of gradients. A distinctive step in EMGD is the mixed gradient descent, where we use a combination of the full and stochastic gradients to update the intermediate solution. Theoretical analysis shows that EMGD is able to find an ǫoptimal solution by computing O(log 1/ǫ) full gradients and O(κ2 log 1/ǫ) stochastic gradients. 1
Robust inversion, dimensionality reduction, randomized sampling
, 2011
"... Abstract We consider a class of inverse problems in which the forward model is the solution operator to linear ODEs or PDEs. This class admits several dimensionalityreduction techniques based on data averaging or sampling, which are especially useful for largescale problems. We survey these approa ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract We consider a class of inverse problems in which the forward model is the solution operator to linear ODEs or PDEs. This class admits several dimensionalityreduction techniques based on data averaging or sampling, which are especially useful for largescale problems. We survey these approaches and their connection to stochastic optimization. The dataaveraging approach is only viable, however, for a leastsquares misfit, which is sensitive to outliers in the data and artifacts unexplained by the forward model. This motivates us to propose a robust formulation based on the Student’s tdistribution of the error. We demonstrate how the corresponding penalty function, together with the sampling approach, can obtain good results for a largescale seismic inverse problem with 50 % corrupted data. Keywords inverse problems · seismic inversion · stochastic optimization · robust estimation 1
Mixed optimization for smooth functions
 In Neural Information Processing Systems (NIPS
, 2013
"... Abstract It is well known that the optimal convergence rate for stochastic optimization of smooth functions is O(1/ √ T ), which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a conve ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Abstract It is well known that the optimal convergence rate for stochastic optimization of smooth functions is O(1/ √ T ), which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a convergence rate of O(1/T 2 ). In this work, we consider a new setup for optimizing smooth functions, termed as Mixed Optimization, which allows to access both a stochastic oracle and a full gradient oracle. Our goal is to significantly improve the convergence rate of stochastic optimization of smooth functions by having an additional small number of accesses to the full gradient oracle. We show that, with an O(ln T ) calls to the full gradient oracle and an O(T ) calls to the stochastic oracle, the proposed mixed optimization algorithm is able to achieve an optimization error of O(1/T ).
Adaptive Image Synthesis for Compressive Displays
 ACM Trans. Graph. (SIGGRAPH
, 2013
"... Figure 1: Adaptive light field synthesis for a duallayer compressive display. By combining sampling, rendering, and displayspecific optimization into a single framework, the proposed algorithm facilitates light field synthesis with significantly reduced computational resources. Redundancy in the ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Figure 1: Adaptive light field synthesis for a duallayer compressive display. By combining sampling, rendering, and displayspecific optimization into a single framework, the proposed algorithm facilitates light field synthesis with significantly reduced computational resources. Redundancy in the light field as well as limitations of display hardware are exploited to generate highquality reconstructions (center left column) for a highresolution target light field of 85 × 21 views with 840 × 525 pixels each (center). Our adaptive reconstruction uses only 3.82 % of the rays in the full target light field (left column), thus providing significant savings both during rendering and during the computation of the display parameters. The proposed framework allows for higherresolution light fields, better 3D effects, and perceptually correct animations to be presented on emerging compressive displays (right columns). Recent years have seen proposals for exciting new computational display technologies that are compressive in the sense that they generate high resolution images or light fields with relatively few display parameters. Image synthesis for these types of displays involves two major tasks: sampling and rendering highdimensional target imagery, such as light fields or timevarying light fields, as well as optimizing the display parameters to provide a good approximation of the target content. In this paper, we introduce an adaptive optimization framework for compressive displays that generates high quality images and light fields using only a fraction of the total plenoptic samples. We demonstrate the framework for a large set of display technologies, including several types of autostereoscopic displays, high dynamic range displays, and highresolution displays. We achieve significant performance gains, and in some cases are able to process data that would be infeasible with existing methods.
Stochastic algorithms for inverse problems involving PDEs and many measurements. Submitted
, 2012
"... Inverse problems involving systems of partial differential equations (PDEs) can be very expensive to solve numerically. This is so especially when many experiments, involving different combinations of sources and receivers, are employed in order to obtain reconstructions of acceptable quality. The m ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
(Show Context)
Inverse problems involving systems of partial differential equations (PDEs) can be very expensive to solve numerically. This is so especially when many experiments, involving different combinations of sources and receivers, are employed in order to obtain reconstructions of acceptable quality. The mere evaluation of a misfit function (the distance between predicted and observed data) often requires hundreds and thousands of PDE solves. This article develops and assesses dimensionality reduction methods, both stochastic and deterministic, to reduce this computational burden. We present in detail our methods for solving such inverse problems for the famous DC resistivity and EIT problems. These methods involve incorporation of a priori information such as piecewise smoothness, bounds on the sought conductivity surface, or even a piecewise constant solution. We then assume that all experiments share the same set of receivers and concentrate on methods for reducing the number of combinations of experiments, called simultaneous sources, that are used at each stabilized GaussNewton iteration. Algorithms for controlling the number of such combined sources are proposed and justified. Evaluating the misfit approximately, except for the final verification for terminating the process, always involves random sampling. Methods for Selecting the combined simultaneous sources, involving either random sampling or truncated SVD, are proposed and compared. Highly efficient variants of the resulting algorithms are identified. 1
BLOCK STOCHASTIC GRADIENT ITERATION FOR CONVEX AND NONCONVEX OPTIMIZATION
, 2015
"... The stochastic gradient (SG) method can quickly solve a problem with a large number of components in the objective, or a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, can quickly solve problems with multiple (blocks of ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
The stochastic gradient (SG) method can quickly solve a problem with a large number of components in the objective, or a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, can quickly solve problems with multiple (blocks of) variables. This paper introduces a method that combines the great features of SG and BCD for problems with many components in the objective and with multiple (blocks of) variables. This paper proposes a block SG (BSG) method for both convex and nonconvex programs. BSG generalizes SG by updating all the blocks of variables in the Gauss–Seidel type (updating the current block depends on the previously updated block), in either a fixed or randomly shuffled order. Although BSG has slightly more work at each iteration, it typically outperforms SG because of BSG’s Gauss–Seidel updates and larger step sizes, the latter of which are determined by the smaller perblock Lipschitz constants. The convergence of BSG is established for both convex and nonconvex cases. In the convex case, BSG has the same order of convergence rate as SG. In the nonconvex case, its convergence is established in terms of the expected violation of a firstorder optimality condition. In both cases our analysis is nontrivial since the typical unbiasedness assumption no longer holds. BSG is numerically evaluated on the following problems: stochastic least squares and logistic regression, which are convex, and lowrank tensor recovery and bilinear logistic regression, which are nonconvex. On the convex problems, BSG performed significantly better than SG. On the nonconvex problems, BSG significantly outperformed the deterministic BCD method because the latter tends to stagnate early near local minimizers. Overall, BSG inherits the benefits of both SG approximation and block coordinate updates and is especially useful for solving largescale nonconvex problems.