Results 1  10
of
58
Portfolio Allocation for Bayesian Optimization
"... Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive blackbox optimization scenarios. It uses Bayesian methods t ..."
Abstract

Cited by 23 (14 self)
 Add to MetaCart
(Show Context)
Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive blackbox optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the posterior estimate of the objective. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multiarmed bandit strategy. We propose several portfolio strategies, the best of which we call GPHedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm’s performance. 1
Open Loop Optimistic Planning
"... We consider the problem of planning in a stochastic and discounted environment with a limited numerical budget. More precisely, we investigate strategies exploring the set of possible sequences of actions, so that, once all available numerical resources (e.g. CPU time, number of calls to a generativ ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
We consider the problem of planning in a stochastic and discounted environment with a limited numerical budget. More precisely, we investigate strategies exploring the set of possible sequences of actions, so that, once all available numerical resources (e.g. CPU time, number of calls to a generative model) have been used, one returns a recommendation on the best possible immediate action to follow based on this exploration. The performance of a strategy is assessed in terms of its simple regret, that is the loss in performance resulting from choosing the recommended action instead of an optimal one. We first provide a minimax lower bound for this problem, and show that a uniform planning strategy matches this minimax rate (up to a logarithmic factor). Then we propose a UCB (Upper Confidence Bounds)based planning algorithm, called OLOP (OpenLoop Optimistic Planning), which is also minimax optimal, and prove that it enjoys much faster rates when there is a small proportion of nearoptimal sequences of actions. Finally, we compare our results with the regret bounds one can derive for our setting with bandits algorithms designed for an infinite number of arms. 1
Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
"... We consider a global optimization problem of a deterministic functionf in a semimetric space, given a finite budget ofnevaluations. The functionf is assumed to be locally smooth (around one of its global maxima) with respect to a semimetric ℓ. We describe two algorithms based on optimistic explorat ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
We consider a global optimization problem of a deterministic functionf in a semimetric space, given a finite budget ofnevaluations. The functionf is assumed to be locally smooth (around one of its global maxima) with respect to a semimetric ℓ. We describe two algorithms based on optimistic exploration that use a hierarchical partitioning of the space at all scales. A first contribution is an algorithm, DOO, that requires the knowledge of ℓ. We report a finitesample performance bound in terms of a measure of the quantity of nearoptimal states. We then define a second algorithm, SOO, which does not require the knowledge of the semimetric ℓ under which f is smooth, and whose performance is almost as good as DOO optimallyfitted. 1
Almost optimal exploration in multiarmed bandits
 Proceedings of the 30th International Conference on Machine Learning, June2013. 7 Eyal EvenDar, Shie Mannor, and Yishay
"... We study the problem of exploration in stochastic MultiArmed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
We study the problem of exploration in stochastic MultiArmed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays largescale applications. We present two novel, parameterfree algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doublylogarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the stateoftheart and scales better as the size of the problem increases. 1.
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
"... ..."
(Show Context)
Multiple Identifications in MultiArmed Bandits
"... We study the problem of identifying the top m arms in a multiarmed bandit game. Our proposed solution relies on a new algorithm based on successive rejects of the seemingly bad arms, and successive accepts of the good ones. This algorithmic contribution allows to tackle other multiple identificatio ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of identifying the top m arms in a multiarmed bandit game. Our proposed solution relies on a new algorithm based on successive rejects of the seemingly bad arms, and successive accepts of the good ones. This algorithmic contribution allows to tackle other multiple identifications settings that were previously out of reach. In particular we show that this idea of successive accepts and rejects applies to the multibandit best arm identification problem. 1.
MultiBandit Best Arm Identification
"... We study the problem of identifying the best arm in each of the bandits in a multibandit multiarmed setting. We first propose an algorithm called Gapbased Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap). We then intro ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
We study the problem of identifying the best arm in each of the bandits in a multibandit multiarmed setting. We first propose an algorithm called Gapbased Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap). We then introduce an algorithm, called GapEV, which takes into account the variance of the arms in addition to their gap. We prove an upperbound on the probability of error for both algorithms. Since GapE and GapEV need to tune an exploration parameter that depends on the complexity of the problem, which is often unknown in advance, we also introduce variations of these algorithms that estimate this complexity online. Finally, we evaluate the performance of these algorithms and compare them to other allocation strategies on a number of synthetic problems. 1
UpperConfidenceBound Algorithms for Active Learning in MultiArmed Bandits
, 2012
"... Abstract. In this paper, we study the problem of estimating the mean values of all the arms uniformly well in the multiarmed bandit setting. If the variances of the arms were known, one could design an optimal sampling strategy by pulling the arms proportionally to their variances. However, since t ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we study the problem of estimating the mean values of all the arms uniformly well in the multiarmed bandit setting. If the variances of the arms were known, one could design an optimal sampling strategy by pulling the arms proportionally to their variances. However, since the distributions are not known in advance, we need to design adaptive sampling strategies to select an arm at each round based on the previous observed samples. We describe two strategies based on pulling the arms proportionally to an upperbound on their variances and derive regret bounds for these strategies. We show that the performance of these allocation strategies depends not only on the variances of the arms but also on the full shape of their distributions. 1
Generic Exploration and Karmed Voting Bandits
"... We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pureexploration algorithm, able to cope with various utility functions from multiarmed bandits settings ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pureexploration algorithm, able to cope with various utility functions from multiarmed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd. 1.