(Enter summary)
Abstract: We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the mixing time T of the optimal policy (in the undiscounted case) or by the horizon time T (in the discounted case), we then give algorithms requiring a number of actions and total computation time that... (Update)
Cited by: More
Journal of Machine Learning Research 7 (2006) 1079--1105.. - Multi-Armed Bandit And
(Correct)
BL-WoLF: A Framework For Loss-Bounded Learnability In.. - Vincent Conitzer Conitzer (2003)
(Correct)
P3VI: A Partitioned, Prioritized, Parallel Value Iterator - Wingate, Seppi (2004)
(Correct)
Similar documents (at the sentence level):
56.0%: Near-Optimal Reinforcement Learning in Polynomial Time - Kearns, Singh (1998)
(Correct)
Active bibliography (related documents): More All
0.5: Reinforcement Learning: A Survey - Kaelbling, Littman, Moore (1996)
(Correct)
0.3: Analytical Mean Squared Error Curves for Temporal Difference.. - Singh, Dayan (1988)
(Correct)
0.1: Open Theoretical Questions in Reinforcement Learning - Sutton (1999)
(Correct)
Similar documents based on text: More All
0.4: Polynomial-Time Reinforcement Learning of Near-Optimal Policies - Pivazyan, Shoham
(Correct)
0.3: Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness.. - Domingo
(Correct)
0.2: Exact Inference of Hidden Structure - From Sample Data
(Correct)
Related documents from co-citation: More All
11: Markov games as a framework for multi-agent reinforcement learning
- Littman - 1994
10: Reinforcement learning: a survey
- Kaelbling, Littman et al. - 1996
10: Multiagent reinforcement learning: Theoretical framework and an algorithm
- Hu, Wellman - 1998
BibTeX entry: (Update)
Kearns, M., and Singh, S. 1998. Near-optimal reinforcement learning in polynomial time. In ICML-98, 260--268. http://citeseer.ist.psu.edu/article/kearns98nearoptimal.html More
@inproceedings{ kearns98nearoptimal,
author = "Michael Kearns and Satinder Singh",
title = "Near-optimal reinforcement learning in polynomial time",
booktitle = "Proc. 15th International Conf. on Machine Learning",
publisher = "Morgan Kaufmann, San Francisco, CA",
pages = "260--268",
year = "1998",
url = "citeseer.ist.psu.edu/article/kearns98nearoptimal.html" }
Citations (may not include all citations):
614
Reinforcement Learning: An Introduction
- Sutton, Barto - 1998
413
Neuro-Dynamic Programming (context) - Bertsekas, Tsitsiklis - 1996
281
Machine Learning (context) - Watkins, Dayan - 1992
107
the convergence of stochastic iterative dynamic programming ..
- Jaakkola, Jordan et al. - 1994
80
The role of exploration in learning control
- Thrun - 1992
71
Asynchronous stochastic approximation and Q-learning (context) - Tsitsiklis - 1994
23
Sequential decision problems and neural networks (context) - Barto, Sutton et al. - 1990
13
Convergence of indirect adaptive asynchronous value iteratio.. (context) - Gullapalli, Barto - 1994
7
Analytical mean squared error curves for temporal di#erence ..
- Singh, Dayan - 1988
3
Learning curve bounds for Markov decision processes with und..
- Saul, Singh - 1996
2
Prioritized sweeping: Reinforcement learning with less data .. (context) - Computation, Moore et al. - 1993
2
Markov decision processes : discrete stochastic dynamic prog.. (context) - Learning - 1994
2
cient reinforcement learning (context) - Fiechter - 1994
2
Expected mistake bound model for online reinforcement learni.. (context) - Fiechter - 1997
1
the worstcase analysis of temporal-di#erence learning algori.. (context) - Schapire, Warmuth - 1994
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://sneaker.mindmaker.kfkipark.hu/~szepes/research/Summary.htm): More
Finding Structure in Reinforcement Learning - Thrun, Schwartz (1995)
(Correct)
Finite-time Regret Bounds for the Multiarmed Bandit Problem - Cesa-Bianchi, Fischer (1998)
(Correct)
Module Based Reinforcement Learning for a Real Robot - Kalm'ar
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC