Results 1  10
of
13
On the complexity of best arm identification in multiarmed bandit models. arXiv preprint arXiv:1407.4443
, 2014
"... The stochastic multiarmed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret minimization is now well known, our aim is to contribute to a better understanding of the performance ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The stochastic multiarmed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret minimization is now well known, our aim is to contribute to a better understanding of the performance in terms of identifying the m best arms. We introduce generic notions of complexity for the two dominant frameworks considered in the literature: fixedbudget and fixedconfidence settings. In the fixedconfidence setting, we provide the first known distributiondependent lower bound on the complexity that involves informationtheoretic quantities and holds when m ≥ 1 under general assumptions. In the specific case of two armedbandits, we derive refined lower bounds in both the fixedconfidence and fixedbudget settings, along with matching algorithms for Gaussian and Bernoulli bandit models. These results show in particular that the complexity of the fixedbudget setting may be smaller than the complexity of the fixedconfidence setting, contradicting the familiar behavior observed when testing fully specified alternatives. In addition, we also provide improved sequential stopping rules that have guaranteed error probabilities and shorter average running times. The proofs rely on two technical results that are of independent interest: a deviation lemma for selfnormalized sums (Lemma 19) and a novel change of measure inequality for bandit models (Lemma 1).
On identifying good options under combinatorially structured feedback in finite noisy environments.”
 in Proceedings of the International Conference on Machine Learning (ICML),
, 2015
"... Abstract We consider the problem of identifying a good option out of finite set of options under combinatorially structured, noisy feedback about the quality of the options in a sequential process: In each round, a subset of the options, from an available set of subsets, can be selected to receive ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract We consider the problem of identifying a good option out of finite set of options under combinatorially structured, noisy feedback about the quality of the options in a sequential process: In each round, a subset of the options, from an available set of subsets, can be selected to receive noisy information about the quality of the options in the chosen subset. The goal is to identify the highest quality option, or a group of options of the highest quality, with a small error probability, while using the smallest number of measurements. The problem generalizes bestarm identification problems. By extending previous work, we design new algorithms that are shown to be able to exploit the combinatorial structure of the problem in a nontrivial fashion, while being unimprovable in special cases. The algorithms call a set multicovering oracle, hence their performance and efficiency is strongly tied to whether the associated set multicovering problem can be efficiently solved.
On the Complexity of A/B Testing
 JMLR: WORKSHOP AND CONFERENCE PROCEEDINGS VOL 35 (2014) 1–23
, 2014
"... A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distributiondependent lower bounds for the performance of A/B testing that improve over the results currently available both in the fixedconfidence (or δPAC) and fixedbudge ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distributiondependent lower bounds for the performance of A/B testing that improve over the results currently available both in the fixedconfidence (or δPAC) and fixedbudget settings. When the distribution of the outcomes are Gaussian, we prove that the complexity of the fixedconfidence and fixedbudget settings are equivalent, and that uniform sampling of both alternatives is optimal only in the case of equal variances. In the common variance case, we also provide a stopping rule that terminates faster than existing fixedconfidence algorithms. In the case of Bernoulli distributions, we show that the complexity of fixedbudget setting is smaller than that of fixedconfidence setting and that uniform sampling of both alternatives—though not optimal—is advisable in practice when combined with an appropriate stopping criterion.
Simple regret for infinitely many armed bandits. arXiv:1505.04627, http://arxiv.org/abs/1505.04627, ArXiv eprints,,
, 2015
"... Abstract We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous algorithms for this setting were designed for mini ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous algorithms for this setting were designed for minimizing the cumulative regret of the learner. In this paper, we propose an algorithm aiming at minimizing the simple regret. As in the cumulative regret setting of infinitely many armed bandits, the rate of the simple regret will depend on a parameter β characterizing the distribution of the nearoptimal arms. We prove that depending on β, our algorithm is minimax optimal either up to a multiplicative constant or up to a log(n) factor. We also provide extensions to several important cases: when β is unknown, in a natural setting where the nearoptimal arms have a small variance, and in the case of unknown time horizon.
Sparse Dueling Bandits
"... The dueling bandit problem is a variation of the classical multiarmed bandit in which the allowable actions are noisy comparisons between pairs of arms. This paper focuses on a new approach for finding the “best ” arm according to the Borda criterion using noisy comparisons. We prove that in the a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The dueling bandit problem is a variation of the classical multiarmed bandit in which the allowable actions are noisy comparisons between pairs of arms. This paper focuses on a new approach for finding the “best ” arm according to the Borda criterion using noisy comparisons. We prove that in the absence of structural assumptions, the sample complexity of this problem is proportional to the sum of the inverse squared gaps between the Borda scores of each suboptimal arm and the best arm. We explore this dependence further and consider structural constraints on the pairwise comparison matrix (a particular form of sparsity natural to this problem) that can significantly reduce the sample complexity. This motivates a new algorithm called Successive Elimination with Comparison Sparsity (SECS) that exploits sparsity to find the Borda winner using fewer samples than standard algorithms. We also evaluate the new algorithm experimentally with synthetic and real data. The results show that the sparsity model and the new algorithm can provide significant improvements over standard approaches. 1
To cite this version:
, 2015
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Simple regret for infinitely many armed bandits
Adaptive Concentration Inequalities for Sequential Decision Problems
"... Abstract A key challenge in sequential decision problems is to determine how many samples are needed for an agent to make reliable decisions with good probabilistic guarantees. We introduce Hoeffdinglike concentration inequalities that hold for a random, adaptively chosen number of samples. Our in ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract A key challenge in sequential decision problems is to determine how many samples are needed for an agent to make reliable decisions with good probabilistic guarantees. We introduce Hoeffdinglike concentration inequalities that hold for a random, adaptively chosen number of samples. Our inequalities are tight under natural assumptions and can greatly simplify the analysis of common sequential decision problems. In particular, we apply them to sequential hypothesis testing, best arm identification, and sorting. The resulting algorithms rival or exceed the state of the art both theoretically and empirically.
Causal Bandits: Learning Good Interventions via Causal Inference
"... Abstract We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multiarm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multiarm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches. We propose a new algorithm that exploits the causal feedback and prove a bound on its simple regret that is strictly better (in all quantities) than algorithms that do not use the additional causal information.
Open Problem: Best Arm Identification: Almost InstanceWise Optimality and the Gap Entropy Conjecture
, 2016
"... Abstract The best arm identification problem (BEST1ARM) is the most basic pure exploration problem in stochastic multiarmed bandits. The problem has a long history and attracted significant attention for the last decade. However, we do not yet have a complete understanding of the optimal sample ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The best arm identification problem (BEST1ARM) is the most basic pure exploration problem in stochastic multiarmed bandits. The problem has a long history and attracted significant attention for the last decade. However, we do not yet have a complete understanding of the optimal sample complexity of the problem: The stateoftheart algorithms achieve a sample complexity of O( is the difference between the largest mean and the i th mean), while the best known lower bound is Ω( for general instances and Ω(∆ −2 ln ln ∆ −1 ) for the twoarm instances. We propose to study the instancewise optimality for the BEST1ARM problem. Previous work has proved that it is impossible to have an instance optimal algorithm for the 2arm problem. However, we conjecture that modulo the additive term 2 ) (which is an upper bound and worst case lower bound for the 2arm problem), there is an instance optimal algorithm for BEST1ARM. Moreover, we introduce a new quantity, called the gap entropy for a bestarm problem instance, and conjecture that it is the instancewise lower bound. Hence, resolving this conjecture would provide a final answer to the old and basic problem.
An optimal algorithm for the Thresholding Bandit Problem Maurilio Gutzeit
"... Abstract We study a specific combinatorial pure exploration stochastic bandit problem where the learner aims at finding the set of arms whose means are above a given threshold, up to a given precision, and for a fixed time horizon. We propose a parameterfree algorithm based on an original heuristi ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We study a specific combinatorial pure exploration stochastic bandit problem where the learner aims at finding the set of arms whose means are above a given threshold, up to a given precision, and for a fixed time horizon. We propose a parameterfree algorithm based on an original heuristic, and prove that it is optimal for this problem by deriving matching upper and lower bounds. To the best of our knowledge, this is the first nontrivial pure exploration setting with fixed budget for which optimal strategies are constructed.