See this document in CiteSeerX!

Journal of Machine Learning Research 7 (2006) 1079-1105 Submitted 2/05; Published 6/06 Action Elimination and Stopping Conditions for the  (Make Corrections)  
Multi-Armed Bandit and Reinforcement Learning Problems Eyal Even-Dar...



  Home/Search   Context   Related

 
View or download:
jmlr.org/papers/volume...evendar06a.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  mit.edu/papers/v7/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O ) log(1/d) times to find an e-optimal arm with probability of at least 1-d. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is... (Update)

Active bibliography (related documents):   More   All
0.5:   Local Bandit Approximation for Optimal Learning Problems - Duff, Barto   (Correct)
0.3:   Estimated of Parameter Distributions for . . . - Dimitrakakis (2004)   (Correct)
0.3:   Nearly Optimal Exploration-Exploitation Decision Thresholds - Dimitrakakis (2006)   (Correct)

Similar documents based on text:
0.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

@misc{ and-journal,
  author = "Multi-Armed Bandit And",
  title = "Journal of Machine Learning Research 7 (2006) 1079--1105 Submitted 2/05;
    Published 6/06 Action Elimination and Stopping Conditions for the",
  url = "citeseer.ist.psu.edu/760277.html" }
Citations (may not include all citations):
658   Learning from Delayed Rewards (context) - Watkins - 1989
413   Neuro-Dynamic Programming (context) - Bertsekas, Tsitsiklis - 1995
413   Neuro-Dynamic Programming (context) - Bertsekas, Tsitsiklis - 1996
246   Markov Decision Processes (context) - Puterman - 1994
189   Reinforcement Learning (context) - Sutton, Barto - 1998
59   Gambling in a rigged casino: The adversarial multi-armed ban.. - Auer, Cesa-Bianchi et al. - 1995
29   Near-optimal reinforcement learning in polynomial time - Kearns, Singh - 1998
20   Some aspects of sequential design of experiments (context) - Robbins - 1952
17   An upper bound on the loss from approximate optimal-value fu.. - Singh, Yee - 1994
15   Chapman and Hall (context) - Berry, Fristedt - 1985
10   method for convergence of stochastic approximation and reinf.. (context) - Borkar, Meyn - 2000
10   The non-stochastic multi-armed bandit problem - Auer, Cesa-Bianchi et al. - 2002
8   Competitive queue policies for differentiated services - Aiello, Mansour et al. - 1979
7   Buffer overflow management in QoS switches - Kesselman, Lotker et al. - 1985
6   Approximately optimal approximate reinforcement learning - Kakade, Langford - 2002
4   Dynamic programming and Markov decision processes (context) - Howard - 1960
3   The sample complexity of exploration in the multi-armed band.. - Mannor, Tsitsiklis - 2002
2   A modified dynamic programming method for Markov decision pr.. (context) - MacQueen - 1966
1   Finite time bounds for sampling based fitted value iteration (context) - Szepesvri, Munos - 2005
1   Probability inequalities for sums of bounded random variable.. (context) - Even-Dar, Mansour et al. - 1963

Documents on the same site (http://jmlr.csail.mit.edu/papers/v7/):   More
Estimating the "Wrong" Graphical Model: Benefits in the.. - Wainwright (2006)   (Correct)
A Hierarchy of Support Vector Machines for Pattern Detection - Sahbi, Geman (2006)   (Correct)
Efficient Learning of Label Ranking by Soft Projections.. - Shalev-Shwartz, Singer (2006)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC