See this document in CiteSeerX!

Learning Curve Bounds for Markov Decision Processes with Undiscounted Rewards (1996)  (Make Corrections)  (3 citations)
Lawrence K. Saul, Satinder P. Singh
Computational Learing Theory



  Home/Search   Context   Related

 
View or download:
colorado.edu/users/bavej...COLT96.ps.gz
colorado.edu/~baveja/./P...COLT96.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  colorado.edu/~baveja/papers (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Markov decision processes (MDPs) with undiscounted rewards represent an important class of problems in decision and control. The goal of learning in these MDPs is to find a policy that yields the maximum expected return per unit time. In large state spaces, computing these averages directly is not feasible; instead, the agent must estimate them by stochastic exploration of the state space. In this case, longer exploration times enable more accurate estimates and more informed decision-making.... (Update)

Context of citations to this paper:   More

.... classes of MDP s, if the model of learning is modified from the standard one, or if one changes the criteria for success (Saul Singh, 1996; Fiechter, 1994; Fiechter, 1997; Schapire Warmuth, 1994; Singh Dayan, in press) Fiechter (1994,1997) whose results are...

...j ) i G ij 1 [ Q j ] I Q j, i ] 1 ] ii (D j, i ) j G ji 1 [ Q i ] I Q i, j ] 1 ] jj . Notes 1. See Saul Singh (1996) for learning curve bounds for an interesting Markov decision process that are derived using techniques from statistical mechanics. 2. There are other...

Cited by:   More
Near-Optimal Reinforcement Learning in Polynomial Time - Kearns, Singh (1998)   (Correct)
Analytical Mean Squared Error Curves for Temporal Difference.. - Singh, Dayan (1988)   (Correct)

Similar documents (at the sentence level):
6.8%:   Markov Decision Processes in Large State Spaces - Saul, Singh (1995)   (Correct)

Active bibliography (related documents):   More   All
0.0:   Learning to Take Actions - Khardon (1998)   (Correct)
0.0:   Reinforcement Learning: A Survey - Leslie Pack Kaelbling, Michael L.. (1996)   (Correct)
0.0:   Planning and Scheduling - Dean, Kambhampati (1996)   (Correct)

Similar documents based on text:   More   All
0.4:   Polynomial-Time Reinforcement Learning of Near-Optimal Policies - Pivazyan, Shoham   (Correct)
0.2:   Reinforcement Learning and Mistake Bounded Algorithms - Mansour (1999)   (Correct)
0.2:   A Near-Optimal Polynomial Time Algorithm for Learning in.. - Brafman, Tennenholtz (1999)   (Correct)

Related documents from co-citation:   More   All
3:   the convergence of stochastic iterative dynamic programming algorithms - Jaakkola, Jordan et al. - 1994
3:   Asynchronous stochastic approximation and Q-learning (context) - Tsitsiklis - 1994
2:   Machine Learning (context) - Watkins, Dayan - 1992

BibTeX entry:   (Update)

Saul, L., Singh, S. (1996). Learning curve bounds for Markov decision processes with undiscounted rewards. In COLT96. http://citeseer.ist.psu.edu/saul96learning.html   More

@inproceedings{ saul96learning,
    author = "Lawrence K. Saul and Satinder P. Singh",
    title = "Learning Curve Bounds for a Markov Decision Process with Undiscounted Rewards",
    booktitle = "Computational Learing Theory",
    pages = "147-156",
    year = "1996",
    url = "citeseer.ist.psu.edu/saul96learning.html" }
Citations (may not include all citations):
563   Learning to predict by the methods of temporal differences - Sutton - 1988
278   Dynamic programming and optimal control (context) - Bertsekas - 1995
257   Learning to act using real-time dynamic programming - Barto, Bradtke et al. - 1995
120   Large deviation techniques in decision (context) - Bucklew - 1990
70   Statistical Mechanics (context) - Huang - 1987
60   A survey of matrix theory and matrix inequalities (context) - Marcus, Minc - 1992
58   Statistical mechanics of learning from examples (context) - Seung, Sompolinsky et al. - 1992
47   Rigorous learning curve bounds from statistical mechanics - Haussler, Kearns et al. - 1994
40   The statistical mechanics of learning a rule (context) - Watkin, Rau et al. - 1993
23   Reinforcement learning algorithms for average-payoff markovi.. - Singh - 1994
11   Efficient reinforcement learning - Fiechter - 1994
1   A Eigenvalues In this appendix we derive eq (context) - Watkins, Dayan et al. - 1992
1   Markov decision processes in large state spaces - Saul, Singh - 1995

Documents on the same site (http://www.cs.colorado.edu/~baveja/papers.html):   More
An Upper Bound on the Loss from Approximate Optimal-Value.. - Singh, Yee (1994)   (Correct)
Theoretical Results on Reinforcement Learning with.. - Precup, Sutton, Singh (1998)   (Correct)
Soft Dynamic Programming Algorithms: Convergence Proofs - Singh (1993)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC