Results 1  10
of
44
Query Complexity of DerivativeFree Optimization
"... This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Booleanvalued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Booleanvalued function comparisons, the convergence rate is the same. 1
Stochastic convex optimization with bandit feedback
, 1107
"... This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ X. The quantity of interest is ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ X. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm’s query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs Õ(poly(d) √ T) regret. Since any algorithm has regret at least Ω ( √ T) on this problem, our algorithm is optimal in terms of the scaling with T. 1
The complexity of largescale convex programming under a linear optimization oracle.
, 2013
"... Abstract This paper considers a general class of iterative optimization algorithms, referred to as linearoptimizationbased convex programming (LCP) methods, for solving largescale convex programming (CP) problems. The LCP methods, covering the classic conditional gradient (CG) method (a.k.a., Fra ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract This paper considers a general class of iterative optimization algorithms, referred to as linearoptimizationbased convex programming (LCP) methods, for solving largescale convex programming (CP) problems. The LCP methods, covering the classic conditional gradient (CG) method (a.k.a., FrankWolfe method) as a special case, can only solve a linear optimization subproblem at each iteration. In this paper, we first establish a series of lower complexity bounds for the LCP methods to solve different classes of CP problems, including smooth, nonsmooth and certain saddlepoint problems. We then formally establish the theoretical optimality or nearly optimality, in the largescale case, for the CG method and its variants to solve different classes of CP problems. We also introduce several new optimal LCP methods, obtained by properly modifying Nesterov's accelerated gradient method, and demonstrate their possible advantages over the classic CG for solving certain classes of largescale CP problems.
Smoothing and WorstCase Complexity for DirectSearch Methods in Nonsmooth Optimization
, 2012
"... In the context of the derivativefree optimization of a smooth objective function, it has been shown that the worst case complexity of directsearch methods is of the same order as the one of steepest descent for derivativebased optimization, more precisely that the number of iterations needed to r ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
In the context of the derivativefree optimization of a smooth objective function, it has been shown that the worst case complexity of directsearch methods is of the same order as the one of steepest descent for derivativebased optimization, more precisely that the number of iterations needed to reduce the norm of the gradient of the objective function below a certain threshold is proportional to the inverse of the threshold squared. Motivated by the lack of such a result in the nonsmooth case, we propose, analyze, and test a class of smoothing directsearch methods for the unconstrained optimization of nonsmooth functions. Given a parameterized family of smoothing functions for the nonsmooth objective function dependent on a smoothing parameter, this class of methods consists of applying a directsearch algorithm for a fixed value of the smoothing parameter until the step size is relatively small, after which the smoothing parameter is reduced and the process is repeated. One can show that the worst case complexity (or cost) of this procedure is roughly one order of magnitude worse than the one for direct search or steepest descent on smooth functions. The class of smoothing directsearch methods is also showed to enjoy asymptotic global convergence properties. Some preliminary numerical experiments indicates that this approach leads to better values of the objective function, pushing in some cases the optimization further, apparently without an additional cost in the number of function evaluations.
On the complexity of bandit and derivativefree stochastic convex optimization
 CoRR
"... The problem of stochastic convex optimization with bandit feedback (in the learning community) or without knowledge of gradients (in the optimization community) has received much attention in recent years, in the form of algorithms and performance upper bounds. However, much less is known about the ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The problem of stochastic convex optimization with bandit feedback (in the learning community) or without knowledge of gradients (in the optimization community) has received much attention in recent years, in the form of algorithms and performance upper bounds. However, much less is known about the inherent complexity of these problems, and there are few lower bounds in the literature, especially for nonlinear functions. In this paper, we investigate the attainable error/regret in the bandit and derivativefree settings, as a function of the dimension d and the available number of queries T. We provide a precise characterization of the attainable performance for stronglyconvex and smooth functions, which also imply a nontrivial lower bound for more general problems. Moreover, we prove that in both the bandit and derivativefree setting, the required number of queries must scale at least quadratically with the dimension. Finally, we show that on the natural class of quadratic functions, it is possible to obtain a “fast ” O(1/T) error rate in terms of T, under mild assumptions, even without having access to gradients. To the best of our knowledge, this is the first such rate in a derivativefree stochastic setting, and holds despite previous results which seem to imply the contrary.
Beneath the valley of the noncommutative arithmeticgeometric mean inequality: conjectures, casestudies, and consequences
, 2012
"... Randomized algorithms that base iterationlevel decisions on samples from some pool are ubiquitous in machine learning and optimization. Examples include stochastic gradient descent and randomized coordinate descent. This paper makes progress at theoretically evaluating the difference in performance ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Randomized algorithms that base iterationlevel decisions on samples from some pool are ubiquitous in machine learning and optimization. Examples include stochastic gradient descent and randomized coordinate descent. This paper makes progress at theoretically evaluating the difference in performance between sampling with and withoutreplacement in such algorithms. Focusing on least means squares optimization, we formulate a noncommutative arithmeticgeometric mean inequality that would prove that the expected convergence rate of withoutreplacement sampling is faster than that of withreplacement sampling. We demonstrate that this inequality holds for many classes of random matrices and for some pathological examples as well. We provide a deterministic worstcase bound on the gap between the discrepancy between the two sampling models, and explore some of the impediments to proving this inequality in full generality. We detail the consequences of this inequality for stochastic gradient descent and the randomized Kaczmarz algorithm for solving linear systems.
Sustainable migration policies
"... This paper considers whether countries might mutually agree a policy of allowing free movement of workers. For the countries to agree, the short run costs must outweighed by the long term benefits that result from better labor market flexibility and income smoothing. We show that such policies are l ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
This paper considers whether countries might mutually agree a policy of allowing free movement of workers. For the countries to agree, the short run costs must outweighed by the long term benefits that result from better labor market flexibility and income smoothing. We show that such policies are less likely to be adopted for less risk averse workers and for countries that trade more. More surprisingly we find that some congestion costs can help. This reverses the conventional wisdom that congestion costs tend to inhibit free migration policies.
Stable and efficient coalitional networks
, 2011
"... We develop a theoretical framework that allows us to study which bilateral links and coalition structures are going to emerge at equilibrium. We define the notion of coalitional network to represent a network and a coalition structure, where the network specifies the nature of the relationship each ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We develop a theoretical framework that allows us to study which bilateral links and coalition structures are going to emerge at equilibrium. We define the notion of coalitional network to represent a network and a coalition structure, where the network specifies the nature of the relationship each individual has with her coalition members and with individuals outside her coalition. To predict the coalitional networks that are going to emerge at equilibrium we propose the concepts of strong stability and of contractual stability. Contractual stability imposes that any change made to the coalitional network needs the consent of both the deviating players and their original coalition partners. Requiring the consent of coalition members under the simple majority or unanimity decision rule may help to reconcile stability and efficiency. Moreover, this new framework can provide in sights that one cannot obtain if
Commentary on “Towards a Noncommutative ArithmeticGeometric Mean Inequality”
"... In their paper, Recht and Ré have presented conjectures and consequences of noncommutative variants of the arithmetic meangeometric mean (AMGM) inequality for positive definite matrices. Let A1,..., An be a collection of positive semidefinite matrices and i1,..., ik be random indices in {1,..., n} ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In their paper, Recht and Ré have presented conjectures and consequences of noncommutative variants of the arithmetic meangeometric mean (AMGM) inequality for positive definite matrices. Let A1,..., An be a collection of positive semidefinite matrices and i1,..., ik be random indices in {1,..., n}. To avoid symmetrization issues that arise since matrix products are noncommutative, Recht and Ré define the expectation operators and (n − k)!