Results 1 - 10
of
20
Exponential regret bounds for Gaussian process bandits with deterministic observations
- In ICML
, 2012
"... This paper analyzes the problem of Gaussian process (GP) bandits with deterministic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al., 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, (Sriniv ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
This paper analyzes the problem of Gaussian process (GP) bandits with deterministic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al., 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, (Srinivas et al., 2010) proved that the regret ( ) vanishes at the approximate 1√t rate of O, where t is the number of observations. To complement their result, we attack the deterministic case and attain a much faster exponential convergence rate. Under some regularity assumptions, we show that the regret decreases asymptotically according to O e − τt (ln t) d/4 with high probability. Here, d is the dimension of the search space and τ is a constant that depends on the behaviour of the objective function near its global maximum. 1.
Stochastic simultaneous optimistic optimization
- In International Conference on Machine Learning
, 2013
"... We study the problem of global maximization of a function f given a finite number of evaluations perturbed by noise. We consider a very weak assumption on the function, namely that it is locally smooth (in some precise sense) with respect to some semi-metric, around one of its global maxima. Compare ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
(Show Context)
We study the problem of global maximization of a function f given a finite number of evaluations perturbed by noise. We consider a very weak assumption on the function, namely that it is locally smooth (in some precise sense) with respect to some semi-metric, around one of its global maxima. Compared to previous works on bandits in general spaces (Kleinberg et al., 2008; Bubeck et al., 2011a) our algorithm does not require the knowledge of this semi-metric. Our algorithm, StoSOO, follows an optimistic strategy to iteratively construct upper confidence bounds over the hierarchical partitions of the function domain to decide which point to sample next. A finite-time analysis of StoSOO shows that it performs almost as well as the best specifically-tuned algorithms even though the local smoothness of the function is not known. 1.
On correlation and budget constraints in model-based bandit optimization
"... with application to automatic machine learning ..."
(Show Context)
Bayesian Multi-Scale Optimistic Optimization
"... Bayesian optimization is a powerful global op-timization technique for expensive black-box functions. One of its shortcomings is that it re-quires auxiliary optimization of an acquisition function at each iteration. This auxiliary opti-mization can be costly and very hard to carry out in practice. M ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Bayesian optimization is a powerful global op-timization technique for expensive black-box functions. One of its shortcomings is that it re-quires auxiliary optimization of an acquisition function at each iteration. This auxiliary opti-mization can be costly and very hard to carry out in practice. Moreover, it creates serious theoret-ical concerns, as most of the convergence results assume that the exact optimum of the acquisition function can be found. In this paper, we intro-duce a new technique for efficient global opti-mization that combines Gaussian process confi-dence bounds and treed simultaneous optimistic optimization to eliminate the need for auxiliary optimization of acquisition functions. The exper-iments with global optimization benchmarks and a novel application to automatic information ex-traction demonstrate that the resulting technique is more efficient than the two approaches from which it draws inspiration. Unlike most theo-retical analyses of Bayesian optimization with Gaussian processes, our finite-time convergence rate proofs do not require exact optimization of an acquisition function. That is, our approach eliminates the unsatisfactory assumption that a difficult, potentially NP-hard, problem has to be solved in order to obtain vanishing regret rates. 1
Relative confidence sampling for efficient on-line ranker evaluation
- In WSDM ’14
, 2014
"... A key challenge in information retrieval is that of on-line ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, whi ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, which interleave lists proposed by two different candidate rankers, then the problem of minimizing the total regret accumulated while evaluating the rankers can be formalized as a K-armed dueling bandits problem. In this paper, we propose a new method called relative confidence sampling (RCS) that aims to reduce cumulative regret by being less conservative than existing methods in eliminating rankers from contention. In addition, we present an empirical comparison between RCS and two state-of-the-art methods, relative upper confidence bound and SAVAGE. The results demonstrate that RCS can substantially outperform these alternatives on several large learning to rank datasets.
Optimistic planning for belief-augmented Markov decision processes
- In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL
, 2013
"... Abstract—This paper presents the Bayesian Optimistic Plan-ning (BOP) algorithm, a novel model-based Bayesian reinforce-ment learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OP-MDP) algorithm [10], [9] to contexts where the transition mode ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Abstract—This paper presents the Bayesian Optimistic Plan-ning (BOP) algorithm, a novel model-based Bayesian reinforce-ment learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OP-MDP) algorithm [10], [9] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet dis-tributions, and the BOP algorithm plans in the belief-augmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance. I.
Optimistic planning for continuous–action deterministic systems.
- In 2013 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-13),
, 2013
"... Abstract : We consider the optimal control of systems with deterministic dynamics, continuous, possibly large-scale state spaces, and continuous, low-dimensional action spaces. We describe an online planning algorithm called SOOP, which like other algorithms in its class has no direct dependence on ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract : We consider the optimal control of systems with deterministic dynamics, continuous, possibly large-scale state spaces, and continuous, low-dimensional action spaces. We describe an online planning algorithm called SOOP, which like other algorithms in its class has no direct dependence on the state space structure. Unlike previous algorithms, SOOP explores the true solution space, consisting of infinite sequences of continuous actions, without requiring knowledge about the smoothness of the system. To this end, it borrows the principle of the simultaneous optimistic optimization method, and develops a nontrivial adaptation of this principle to the planning problem. Experiments on four problems show SOOP reliably ranks among the best algorithms, fully dominating competing methods when the problem requires both long horizons and fine discretization.
Bayesian Optimization with Exponential Convergence
"... Abstract This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the δ-cover sampling. Most Bayesian optimization methods require auxiliary optimization: an additional non-convex global optimization problem, which can be ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the δ-cover sampling. Most Bayesian optimization methods require auxiliary optimization: an additional non-convex global optimization problem, which can be time-consuming and hard to implement in practice. Also, the existing Bayesian optimization method with exponential convergence [1] requires access to the δ-cover sampling, which was considered to be impractical
Black-box optimization of noisy functions with unknown smoothness
"... Abstract We study the problem of black-box optimization of a function f of any dimension, given function evaluations perturbed by noise. The function is assumed to be locally smooth around one of its global optima, but this smoothness is unknown. Our contribution is an adaptive optimization algorit ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract We study the problem of black-box optimization of a function f of any dimension, given function evaluations perturbed by noise. The function is assumed to be locally smooth around one of its global optima, but this smoothness is unknown. Our contribution is an adaptive optimization algorithm, POO or parallel optimistic optimization, that is able to deal with this setting. POO performs almost as well as the best known algorithms requiring the knowledge of the smoothness. Furthermore, POO works for a larger class of functions than what was previously considered, especially for functions that are difficult to optimize, in a very precise sense. We provide a finite-time analysis of POO's performance, which shows that its error after n evaluations is at most a factor of √ ln n away from the error of the best known optimization algorithms using the knowledge of the smoothness.
(2013)" Consensus for Agents with General Dynamics Using Optimistic Optimization
, 2013
"... Abstract—An important challenge in multiagent systems is consensus, in which the agents must agree on certain controlled variables of interest. So far, most consensus algorithms for agents with nonlinear dynamics exploit the specific form of the nonlinearity. Here, we propose an approach that only r ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—An important challenge in multiagent systems is consensus, in which the agents must agree on certain controlled variables of interest. So far, most consensus algorithms for agents with nonlinear dynamics exploit the specific form of the nonlinearity. Here, we propose an approach that only requires a blackbox simulation model of the dynamics, and is therefore applicable to a wide class of nonlinearities. This approach works for agents communicating on a fixed, connected network. It designs a reference behavior with a classical consensus protocol, and then finds control actions that drive the nonlinear agents towards the reference states, using a recent optimistic optimization algorithm. By exploiting the guarantees of optimistic optimization, we prove that the agents achieve practical consensus. A representative example is further analyzed, and simulation results on nonlinear robotic arms are provided. I.