(Enter summary)
Abstract: Markov decision processes (MDPs) with undiscounted
rewards represent an important class
of problems in decision and control. The goal
of learning in these MDPs is to find a policy
that yields the maximum expected return per
unit time. In large state spaces, computing
these averages directly is not feasible; instead,
the agent must estimate them by stochastic
exploration of the state space. In this case,
longer exploration times enable more accurate
estimates and more informed decision-making.... (Update)
Context of citations to this paper: More
.... classes of MDP s, if the model of learning is modified from the standard one, or if one changes the criteria for success (Saul Singh, 1996; Fiechter, 1994; Fiechter, 1997; Schapire Warmuth, 1994; Singh Dayan, in press) Fiechter (1994,1997) whose results are...
...j ) i G ij 1 [ Q j ] I Q j, i ] 1 ] ii (D j, i ) j G ji 1 [ Q i ] I Q i, j ] 1 ] jj . Notes 1. See Saul Singh (1996) for learning curve bounds for an interesting Markov decision process that are derived using techniques from statistical mechanics. 2. There are other...
Cited by: More
Near-Optimal Reinforcement Learning in Polynomial Time - Kearns, Singh (1998)
(Correct)
Analytical Mean Squared Error Curves for Temporal Difference.. - Singh, Dayan (1988)
(Correct)
Similar documents (at the sentence level):
6.8%: Markov Decision Processes in Large State Spaces - Saul, Singh (1995)
(Correct)
Active bibliography (related documents): More All
0.0: Learning to Take Actions - Khardon (1998)
(Correct)
0.0: Reinforcement Learning: A Survey - Leslie Pack Kaelbling, Michael L.. (1996)
(Correct)
0.0: Planning and Scheduling - Dean, Kambhampati (1996)
(Correct)
Similar documents based on text: More All
0.4: Polynomial-Time Reinforcement Learning of Near-Optimal Policies - Pivazyan, Shoham
(Correct)
0.2: Reinforcement Learning and Mistake Bounded Algorithms - Mansour (1999)
(Correct)
0.2: A Near-Optimal Polynomial Time Algorithm for Learning in.. - Brafman, Tennenholtz (1999)
(Correct)
Related documents from co-citation: More All
3: the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, Jordan et al. - 1994
3: Asynchronous stochastic approximation and Q-learning (context) - Tsitsiklis - 1994
2: Machine Learning (context) - Watkins, Dayan - 1992
BibTeX entry: (Update)
Saul, L., Singh, S. (1996). Learning curve bounds for Markov decision processes with undiscounted rewards. In COLT96. http://citeseer.ist.psu.edu/saul96learning.html More
@inproceedings{ saul96learning,
author = "Lawrence K. Saul and Satinder P. Singh",
title = "Learning Curve Bounds for a Markov Decision Process with Undiscounted Rewards",
booktitle = "Computational Learing Theory",
pages = "147-156",
year = "1996",
url = "citeseer.ist.psu.edu/saul96learning.html" }
Citations (may not include all citations):
563
Learning to predict by the methods of temporal differences
- Sutton - 1988
278
Dynamic programming and optimal control (context) - Bertsekas - 1995
257
Learning to act using real-time dynamic programming
- Barto, Bradtke et al. - 1995
120
Large deviation techniques in decision (context) - Bucklew - 1990
70
Statistical Mechanics (context) - Huang - 1987
60
A survey of matrix theory and matrix inequalities (context) - Marcus, Minc - 1992
58
Statistical mechanics of learning from examples (context) - Seung, Sompolinsky et al. - 1992
47
Rigorous learning curve bounds from statistical mechanics
- Haussler, Kearns et al. - 1994
40
The statistical mechanics of learning a rule (context) - Watkin, Rau et al. - 1993
23
Reinforcement learning algorithms for average-payoff markovi..
- Singh - 1994
11
Efficient reinforcement learning
- Fiechter - 1994
1
A Eigenvalues In this appendix we derive eq (context) - Watkins, Dayan et al. - 1992
1
Markov decision processes in large state spaces
- Saul, Singh - 1995
Documents on the same site (http://www.cs.colorado.edu/~baveja/papers.html): More
An Upper Bound on the Loss from Approximate Optimal-Value.. - Singh, Yee (1994)
(Correct)
Theoretical Results on Reinforcement Learning with.. - Precup, Sutton, Singh (1998)
(Correct)
Soft Dynamic Programming Algorithms: Convergence Proofs - Singh (1993)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC