Results 1  10
of
6,947
Finitetime analysis of the multiarmed bandit problem
 Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract

Cited by 817 (15 self)
 Add to MetaCart
this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multiarmed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has
Thompson sampling: An asymptotically optimal finitetime analysis
 In Algorithmic Learning Theory
"... The question of the optimality of Thompson Sampling for solving the stochastic multiarmed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finitetime analysis that matches the asymptotic rate given in the Lai an ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
The question of the optimality of Thompson Sampling for solving the stochastic multiarmed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finitetime analysis that matches the asymptotic rate given in the Lai
FiniteTime Analysis of Stratified Sampling
"... We consider the problem of stratified sampling for MonteCarlo integration. We model this problem in a multiarmed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according ..."
Abstract
 Add to MetaCart
We consider the problem of stratified sampling for MonteCarlo integration. We model this problem in a multiarmed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the strata. We provide two regret analyses: a distributiondependent bound � O(n −3/2) that depends on a measure of the disparity of the strata, and a distributionfree bound � O(n −4/3) that does not. 1
FiniteTime Analysis of Kernelised Contextual Bandits
 THE 29TH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2013
"... We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions’ contexts and that ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
out to be a special case of our algorithm, and our finitetime analysis improves the regret bound of GPUCB for the agnostic case, both in the terms of the kerneldependent quantity and the RKHS norm of the reward function. Moreover, for the linear kernel, our regret bound matches the lower bound
Finite Time Analysis of Stratied Sampling
"... We consider the problem of stratied sampling for MonteCarlo integration. We model this problem in a multiarmed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according t ..."
Abstract
 Add to MetaCart
We consider the problem of stratied sampling for MonteCarlo integration. We model this problem in a multiarmed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the strata. We provide two regret analyses: a distributiondependent bound eO(n3=2) that depends on a measure of the disparity of the strata, and a distributionfree bound eO(n4=3) that does not. 1
FiniteTime Analysis of Projected Langevin Monte Carlo
"... Abstract We analyze the projected Langevin Monte Carlo (LMC) algorithm, a close cousin of projected Stochastic Gradient Descent (SGD). We show that LMC allows to sample in polynomial time from a posterior distribution restricted to a convex body and with concave loglikelihood. This gives the first ..."
Abstract
 Add to MetaCart
Abstract We analyze the projected Langevin Monte Carlo (LMC) algorithm, a close cousin of projected Stochastic Gradient Descent (SGD). We show that LMC allows to sample in polynomial time from a posterior distribution restricted to a convex body and with concave loglikelihood. This gives
1Distributed Detection: Finitetime Analysis and Impact of Network Topology
"... This paper addresses the problem of distributed detection in multiagent networks. Agents receive private signals about an unknown state of the world. The underlying state is globally identifiable, yet informative signals may be dispersed throughout the network. Using an optimizationbased framework ..."
Abstract
 Add to MetaCart
based framework, we develop an iterative local strategy for updating individual beliefs. In contrast to the existing literature which focuses on asymptotic learning, we provide a finitetime analysis. Furthermore, we introduce a KullbackLeibler cost to compare the efficiency of the algorithm to its centralized
Correspondence Finite Time Analysis of the Pursuit Algorithm for Learning Automata
"... AbstractThe problem of analyzing the finite time behavior of learning automata is considered. This problem involves the finite time analysis of the learning algorithm used by the learning automaton and is important in determining the rate of convergence of the automaton. In this paper, a general fr ..."
Abstract
 Add to MetaCart
AbstractThe problem of analyzing the finite time behavior of learning automata is considered. This problem involves the finite time analysis of the learning algorithm used by the learning automaton and is important in determining the rate of convergence of the automaton. In this paper, a general
Finite time analysis of an endoreversible fuel cell
, 2008
"... The aim of this paper consists in a detailed thermodynamical description of a fuel cell, using finite time thermodynamics (FTT). Starting from the comparison beetween a reversible fuel cell and a Carnot heat engine driven by a perfect chemical reaction, we remind that – contrary to a common opinion ..."
Abstract
 Add to MetaCart
The aim of this paper consists in a detailed thermodynamical description of a fuel cell, using finite time thermodynamics (FTT). Starting from the comparison beetween a reversible fuel cell and a Carnot heat engine driven by a perfect chemical reaction, we remind that – contrary to a common opinion
Finite Time Analysis of a TriGeneration Cycle. Energies, 8 (6)
"... Abstract: A review of the literature indicates that current trigeneration cycles show low thermal performance, even when optimised for maximum useful output. This paper presents a Finite Time analysis of a trigeneration cycle that is based upon coupled power and refrigeration Carnot cycles. The a ..."
Abstract
 Add to MetaCart
Abstract: A review of the literature indicates that current trigeneration cycles show low thermal performance, even when optimised for maximum useful output. This paper presents a Finite Time analysis of a trigeneration cycle that is based upon coupled power and refrigeration Carnot cycles
Results 1  10
of
6,947