Results 1 
8 of
8
Escaping the local minima via simulated annealing: Optimization of approximately convex functions
, 2015
"... We consider the problem of optimizing an approximately convex function over a bounded convex set in Rn using only function evaluations. The problem is reduced to sampling from an approximately logconcave distribution using the HitandRun method, which is shown to have the same O ∗ complexity as s ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We consider the problem of optimizing an approximately convex function over a bounded convex set in Rn using only function evaluations. The problem is reduced to sampling from an approximately logconcave distribution using the HitandRun method, which is shown to have the same O ∗ complexity as sampling from logconcave distributions. In addition to extend the analysis for logconcave distributions to approximate logconcave distributions, the implementation of the 1dimensional sampler of the HitandRun walk requires new methods and analysis. The algorithm then is based on simulated annealing which does not relies on first order conditions which makes it essentially immune to local minima. We then apply the method to different motivating problems. In the context of zeroth order stochastic convex optimization, the proposed method produces an minimizer after O∗(n7.5−2) noisy function evaluations by inducing a O(/n)approximately log concave distribution. We also consider in detail the case when the “amount of nonconvexity ” decays towards the optimum of the function. Other applications of the method discussed in this work include private computation of empirical risk minimizers, twostage stochastic programming, and approximate dynamic programming for online learning. 1.
Informationtheoretic lower bounds for convex optimization with erroneous oracles
"... Abstract We consider the problem of optimizing convex and concave functions with access to an erroneous zerothorder oracle. In particular, for a given function x → f (x) we consider optimization when one is given access to absolute error oracles that return values in [f (x) − , f (x) + ] or relati ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract We consider the problem of optimizing convex and concave functions with access to an erroneous zerothorder oracle. In particular, for a given function x → f (x) we consider optimization when one is given access to absolute error oracles that return values in [f (x) − , f (x) + ] or relative error oracles that return value in , for some > 0. We show stark information theoretic impossibility results for minimizing convex functions and maximizing concave functions over polytopes in this model.
Algorithm portfolios for noisy optimization: Compare solvers early
 In Proceedings of Lion8
, 2014
"... Abstract. Noisy optimization is the optimization of objective functions corrupted by noise. A portfolio of algorithms is a set of algorithms equipped with an algorithm selection tool for distributing the computational power among them. We study portfolios of noisy optimization solvers, show that di ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Noisy optimization is the optimization of objective functions corrupted by noise. A portfolio of algorithms is a set of algorithms equipped with an algorithm selection tool for distributing the computational power among them. We study portfolios of noisy optimization solvers, show that different settings lead to different performances, obtain mathematically proved performance (in the sense that the portfolio performs nearly as well as the best of its algorithms) by an ad hoc selection algorithm dedicated to noisy optimization. A somehow surprising result is that it is better to compare solvers with some lag; i.e., recommend the current recommendation of the best solver, selected from a comparison based on their recommendations earlier in the run. 1
Exploiting Smoothness in Statistical Learning, Sequential Prediction, and Stochastic Optimization
, 2014
"... ..."
Algorithms and matching lower bounds for approximatelyconvex optimization
"... Abstract In recent years, a rapidly increasing number of applications in practice requires optimizing nonconvex objectives, like training neural networks, learning graphical models, maximum likelihood estimation. Though simple heuristics such as gradient descent with very few modifications tend to ..."
Abstract
 Add to MetaCart
Abstract In recent years, a rapidly increasing number of applications in practice requires optimizing nonconvex objectives, like training neural networks, learning graphical models, maximum likelihood estimation. Though simple heuristics such as gradient descent with very few modifications tend to work well, theoretical understanding is very weak. We consider possibly the most natural class of nonconvex functions where one could hope to obtain provable guarantees: functions that are "approximately convex", i.e. functionsf : R d → R for which there exists a convex function f such that for all x, f (x) − f (x) ≤ ∆ for a fixed value ∆. We then want to minimizẽ f , i.e. output a pointx such thatf (x) ≤ min xf (x) + . It is quite natural to conjecture that for fixed , the problem gets harder for larger ∆, however, the exact dependency of and ∆ is not known. In this paper, we significantly improve the known lower bound on ∆ as a function of and an algorithm matching this lower bound for a natural class of convex bodies. More precisely, we identify a function T : R + → R + such that when ∆ = O(T ( )), we can give an algorithm that outputs a pointx such thatf (x) ≤ min xf (x) + within time poly d, 1 . On the other hand, when ∆ = Ω(T ( )), we also prove an information theoretic lower bound that any algorithm that outputs such ax must use super polynomial number of evaluations off .
Multiple Optimality Guarantees in Statistical Learning
, 2014
"... Classically, the performance of estimators in statistical learning problems is measured in terms of their predictive ability or estimation error as the sample size n grows. In modern statistical and machine learning applications, however, computer scientists, statisticians, and analysts have a varie ..."
Abstract
 Add to MetaCart
(Show Context)
Classically, the performance of estimators in statistical learning problems is measured in terms of their predictive ability or estimation error as the sample size n grows. In modern statistical and machine learning applications, however, computer scientists, statisticians, and analysts have a variety of additional criteria they must balance: estimators must be efficiently computable, data providers may wish to maintain anonymity, large datasets must be stored and accessed. In this thesis, we consider the fundamental questions that arise when trading between multiple such criteria—computation, communication, privacy—while maintaining statistical performance. Can we develop lower bounds that show there must be tradeoffs? Can we develop new procedures that are both theoretically optimal and practically useful? To answer these questions, we explore examples from optimization, confidentiality preserving statistical inference, and distributed estimation under communication constraints. Viewing our examples through a general lens of constrained minimax theory, we prove fundamental lower bounds on the statistical performance of any algorithm subject to the constraints—computational, confidentiality, or communication—specified. These lower bounds allow us to guarantee the optimality of the new algorithms we develop addressing the addi