See this document in CiteSeerX!

Near-Optimal Reinforcement Learning in Polynomial Time (1998)  (Make Corrections)  (29 citations)
Michael Kearns, Satinder Singh
Proc. 15th International Conf. on Machine Learning



  Home/Search   Context   Related

 
View or download:
colorado.edu/users/ba...ICML98_KS.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  sneaker.mindmaker.kfkip...Summary (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the mixing time T of the optimal policy (in the undiscounted case) or by the horizon time T (in the discounted case), we then give algorithms requiring a number of actions and total computation time that... (Update)

Cited by:   More
Journal of Machine Learning Research 7 (2006) 1079--1105.. - Multi-Armed Bandit And   (Correct)
BL-WoLF: A Framework For Loss-Bounded Learnability In.. - Vincent Conitzer Conitzer (2003)   (Correct)
P3VI: A Partitioned, Prioritized, Parallel Value Iterator - Wingate, Seppi (2004)   (Correct)

Similar documents (at the sentence level):
56.0%:   Near-Optimal Reinforcement Learning in Polynomial Time - Kearns, Singh (1998)   (Correct)

Active bibliography (related documents):   More   All
0.5:   Reinforcement Learning: A Survey - Kaelbling, Littman, Moore (1996)   (Correct)
0.3:   Analytical Mean Squared Error Curves for Temporal Difference.. - Singh, Dayan (1988)   (Correct)
0.1:   Open Theoretical Questions in Reinforcement Learning - Sutton (1999)   (Correct)

Similar documents based on text:   More   All
0.4:   Polynomial-Time Reinforcement Learning of Near-Optimal Policies - Pivazyan, Shoham   (Correct)
0.3:   Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness.. - Domingo   (Correct)
0.2:   Exact Inference of Hidden Structure - From Sample Data   (Correct)

Related documents from co-citation:   More   All
11:   Markov games as a framework for multi-agent reinforcement learning - Littman - 1994
10:   Reinforcement learning: a survey - Kaelbling, Littman et al. - 1996
10:   Multiagent reinforcement learning: Theoretical framework and an algorithm - Hu, Wellman - 1998

BibTeX entry:   (Update)

Kearns, M., and Singh, S. 1998. Near-optimal reinforcement learning in polynomial time. In ICML-98, 260--268. http://citeseer.ist.psu.edu/article/kearns98nearoptimal.html   More

@inproceedings{ kearns98nearoptimal,
    author = "Michael Kearns and Satinder Singh",
    title = "Near-optimal reinforcement learning in polynomial time",
    booktitle = "Proc. 15th International Conf. on Machine Learning",
    publisher = "Morgan Kaufmann, San Francisco, CA",
    pages = "260--268",
    year = "1998",
    url = "citeseer.ist.psu.edu/article/kearns98nearoptimal.html" }
Citations (may not include all citations):
614   Reinforcement Learning: An Introduction - Sutton, Barto - 1998
413   Neuro-Dynamic Programming (context) - Bertsekas, Tsitsiklis - 1996
281   Machine Learning (context) - Watkins, Dayan - 1992
107   the convergence of stochastic iterative dynamic programming .. - Jaakkola, Jordan et al. - 1994
80   The role of exploration in learning control - Thrun - 1992
71   Asynchronous stochastic approximation and Q-learning (context) - Tsitsiklis - 1994
23   Sequential decision problems and neural networks (context) - Barto, Sutton et al. - 1990
13   Convergence of indirect adaptive asynchronous value iteratio.. (context) - Gullapalli, Barto - 1994
7   Analytical mean squared error curves for temporal di#erence .. - Singh, Dayan - 1988
3   Learning curve bounds for Markov decision processes with und.. - Saul, Singh - 1996
2   Prioritized sweeping: Reinforcement learning with less data .. (context) - Computation, Moore et al. - 1993
2   Markov decision processes : discrete stochastic dynamic prog.. (context) - Learning - 1994
2   cient reinforcement learning (context) - Fiechter - 1994
2   Expected mistake bound model for online reinforcement learni.. (context) - Fiechter - 1997
1   the worstcase analysis of temporal-di#erence learning algori.. (context) - Schapire, Warmuth - 1994



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://sneaker.mindmaker.kfkipark.hu/~szepes/research/Summary.htm):   More
Finding Structure in Reinforcement Learning - Thrun, Schwartz (1995)   (Correct)
Finite-time Regret Bounds for the Multiarmed Bandit Problem - Cesa-Bianchi, Fischer (1998)   (Correct)
Module Based Reinforcement Learning for a Real Robot - Kalm'ar   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC