Results 1  10
of
20
Exponential regret bounds for Gaussian process bandits with deterministic observations
 In ICML
, 2012
"... This paper analyzes the problem of Gaussian process (GP) bandits with deterministic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al., 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, (Sriniv ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
This paper analyzes the problem of Gaussian process (GP) bandits with deterministic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al., 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, (Srinivas et al., 2010) proved that the regret ( ) vanishes at the approximate 1√t rate of O, where t is the number of observations. To complement their result, we attack the deterministic case and attain a much faster exponential convergence rate. Under some regularity assumptions, we show that the regret decreases asymptotically according to O e − τt (ln t) d/4 with high probability. Here, d is the dimension of the search space and τ is a constant that depends on the behaviour of the objective function near its global maximum. 1.
Stochastic simultaneous optimistic optimization
 In International Conference on Machine Learning
, 2013
"... We study the problem of global maximization of a function f given a finite number of evaluations perturbed by noise. We consider a very weak assumption on the function, namely that it is locally smooth (in some precise sense) with respect to some semimetric, around one of its global maxima. Compare ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
(Show Context)
We study the problem of global maximization of a function f given a finite number of evaluations perturbed by noise. We consider a very weak assumption on the function, namely that it is locally smooth (in some precise sense) with respect to some semimetric, around one of its global maxima. Compared to previous works on bandits in general spaces (Kleinberg et al., 2008; Bubeck et al., 2011a) our algorithm does not require the knowledge of this semimetric. Our algorithm, StoSOO, follows an optimistic strategy to iteratively construct upper confidence bounds over the hierarchical partitions of the function domain to decide which point to sample next. A finitetime analysis of StoSOO shows that it performs almost as well as the best specificallytuned algorithms even though the local smoothness of the function is not known. 1.
On correlation and budget constraints in modelbased bandit optimization
"... with application to automatic machine learning ..."
(Show Context)
Bayesian MultiScale Optimistic Optimization
"... Bayesian optimization is a powerful global optimization technique for expensive blackbox functions. One of its shortcomings is that it requires auxiliary optimization of an acquisition function at each iteration. This auxiliary optimization can be costly and very hard to carry out in practice. M ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Bayesian optimization is a powerful global optimization technique for expensive blackbox functions. One of its shortcomings is that it requires auxiliary optimization of an acquisition function at each iteration. This auxiliary optimization can be costly and very hard to carry out in practice. Moreover, it creates serious theoretical concerns, as most of the convergence results assume that the exact optimum of the acquisition function can be found. In this paper, we introduce a new technique for efficient global optimization that combines Gaussian process confidence bounds and treed simultaneous optimistic optimization to eliminate the need for auxiliary optimization of acquisition functions. The experiments with global optimization benchmarks and a novel application to automatic information extraction demonstrate that the resulting technique is more efficient than the two approaches from which it draws inspiration. Unlike most theoretical analyses of Bayesian optimization with Gaussian processes, our finitetime convergence rate proofs do not require exact optimization of an acquisition function. That is, our approach eliminates the unsatisfactory assumption that a difficult, potentially NPhard, problem has to be solved in order to obtain vanishing regret rates. 1
Relative confidence sampling for efficient online ranker evaluation
 In WSDM ’14
, 2014
"... A key challenge in information retrieval is that of online ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, whi ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
A key challenge in information retrieval is that of online ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, which interleave lists proposed by two different candidate rankers, then the problem of minimizing the total regret accumulated while evaluating the rankers can be formalized as a Karmed dueling bandits problem. In this paper, we propose a new method called relative confidence sampling (RCS) that aims to reduce cumulative regret by being less conservative than existing methods in eliminating rankers from contention. In addition, we present an empirical comparison between RCS and two stateoftheart methods, relative upper confidence bound and SAVAGE. The results demonstrate that RCS can substantially outperform these alternatives on several large learning to rank datasets.
Optimistic planning for beliefaugmented Markov decision processes
 In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL
, 2013
"... Abstract—This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel modelbased Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OPMDP) algorithm [10], [9] to contexts where the transition mode ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel modelbased Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OPMDP) algorithm [10], [9] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet distributions, and the BOP algorithm plans in the beliefaugmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance. I.
Optimistic planning for continuous–action deterministic systems.
 In 2013 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL13),
, 2013
"... Abstract : We consider the optimal control of systems with deterministic dynamics, continuous, possibly largescale state spaces, and continuous, lowdimensional action spaces. We describe an online planning algorithm called SOOP, which like other algorithms in its class has no direct dependence on ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract : We consider the optimal control of systems with deterministic dynamics, continuous, possibly largescale state spaces, and continuous, lowdimensional action spaces. We describe an online planning algorithm called SOOP, which like other algorithms in its class has no direct dependence on the state space structure. Unlike previous algorithms, SOOP explores the true solution space, consisting of infinite sequences of continuous actions, without requiring knowledge about the smoothness of the system. To this end, it borrows the principle of the simultaneous optimistic optimization method, and develops a nontrivial adaptation of this principle to the planning problem. Experiments on four problems show SOOP reliably ranks among the best algorithms, fully dominating competing methods when the problem requires both long horizons and fine discretization.
Bayesian Optimization with Exponential Convergence
"... Abstract This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the δcover sampling. Most Bayesian optimization methods require auxiliary optimization: an additional nonconvex global optimization problem, which can be ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the δcover sampling. Most Bayesian optimization methods require auxiliary optimization: an additional nonconvex global optimization problem, which can be timeconsuming and hard to implement in practice. Also, the existing Bayesian optimization method with exponential convergence [1] requires access to the δcover sampling, which was considered to be impractical
Blackbox optimization of noisy functions with unknown smoothness
"... Abstract We study the problem of blackbox optimization of a function f of any dimension, given function evaluations perturbed by noise. The function is assumed to be locally smooth around one of its global optima, but this smoothness is unknown. Our contribution is an adaptive optimization algorit ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We study the problem of blackbox optimization of a function f of any dimension, given function evaluations perturbed by noise. The function is assumed to be locally smooth around one of its global optima, but this smoothness is unknown. Our contribution is an adaptive optimization algorithm, POO or parallel optimistic optimization, that is able to deal with this setting. POO performs almost as well as the best known algorithms requiring the knowledge of the smoothness. Furthermore, POO works for a larger class of functions than what was previously considered, especially for functions that are difficult to optimize, in a very precise sense. We provide a finitetime analysis of POO's performance, which shows that its error after n evaluations is at most a factor of √ ln n away from the error of the best known optimization algorithms using the knowledge of the smoothness.
(2013)" Consensus for Agents with General Dynamics Using Optimistic Optimization
, 2013
"... Abstract—An important challenge in multiagent systems is consensus, in which the agents must agree on certain controlled variables of interest. So far, most consensus algorithms for agents with nonlinear dynamics exploit the specific form of the nonlinearity. Here, we propose an approach that only r ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—An important challenge in multiagent systems is consensus, in which the agents must agree on certain controlled variables of interest. So far, most consensus algorithms for agents with nonlinear dynamics exploit the specific form of the nonlinearity. Here, we propose an approach that only requires a blackbox simulation model of the dynamics, and is therefore applicable to a wide class of nonlinearities. This approach works for agents communicating on a fixed, connected network. It designs a reference behavior with a classical consensus protocol, and then finds control actions that drive the nonlinear agents towards the reference states, using a recent optimistic optimization algorithm. By exploiting the guarantees of optimistic optimization, we prove that the agents achieve practical consensus. A representative example is further analyzed, and simulation results on nonlinear robotic arms are provided. I.