See this document in CiteSeerX!

Reinforcement Learning for Long-Run Average Cost  (Make Corrections)  
Abhijit Gosavi Assistant Professor 261, Technology Bldg, 2200 Bonforte Blvd...



  Home/Search   Context   Related

 
View or download:
buffalo.edu/~agosavi/rsmart.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  buffalo.edu/~agosavi/papers (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: A large class of sequential decision-making problems under uncertainty can be modeled as Markov and Semi-Markov Decision Problems, when their underlying probability structure has a Markov chain. They may be solved by using classical dynamic programming methods. However, dynamic programming methods su#er from the curse of dimensionality and break down rapidly in face of large state spaces. In addition, dynamic programming methods require the exact computation of the so-called transition... (Update)

Active bibliography (related documents):   More   All
3.1:   Stochasticsand - Statistics Reinforcement Learning   (Correct)
1.1:   A Reinforcement Learning Algorithm based on Policy Iteration for.. - Gosavi (2004)   (Correct)
0.6:   The O.D.E. Method for Convergence of Stochastic Approximation.. - Borkar, Meyn (1999)   (Correct)

Similar documents based on text:
0.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

@misc{ assistant-reinforcement,
  author = "Abhijit Gosavi Assistant",
  title = "Reinforcement Learning for Long-Run Average Cost",
  url = "citeseer.ist.psu.edu/757585.html" }
Citations (may not include all citations):
658   Learning from Delayed Rewards (context) - Watkins - 1989
413   Neuro-Dynamic Programming (context) - Bertsekas, Tsitsiklis - 1996
246   Markov Decision Processes (context) - Puterman - 1994
189   Reinforcement Learning (context) - Sutton, Barto - 1998
139   A stochastic approximation method (context) - Robbins, Monro - 1951
115   Stochastic Approximation Methods for Constrained and Unconst.. (context) - Kushner, Clark - 1978
92   Learning Automata: An Introduction (context) - Narendra, Thatachar - 1989
71   Asynchronous stochastic approximation and q-learning (context) - Tsitsiklis - 1994
64   Algorithms for sequential decision-making - Littman - 1996
61   Analysis of recursive stochastic algorithms (context) - Ljung - 1977
52   A reinforcement learning method for maximizing undiscounted .. (context) - Schwartz - 1993
48   Average reward reinforcement learning: Foundations - Mahadevan - 1996
26   Stochastic approximation with two-time scales (context) - Borkar - 1997
24   Dynamic Programming and Optimal Control (context) - Bertsekas - 1995
23   Reinforcement learning algorithms for average-payo# Markovia.. - Singh - 1994
20   Scaling up average reward reinforcement learning by approxim.. - Tadepalli, Ok - 1996
16   Asynchronous stochastic approximation - Borkar - 1998
14   Actor-critic type learning algorithms for markov decision pr.. (context) - Konda, Borkar
12   Stochastic Dynamic Programming and the Control of Queueing S.. (context) - Sennott - 1999
9   The theory of dynamic programming (context) - Bellman - 1954
8   Solving semi-markov decision problems using average reward r.. - Das, Gosavi et al. - 1999
7   The ode method for convergence of stochastic approximation a.. - Borkar, Meyn
6   Introduction to Reliability Engineering (context) - Lewis - 1994
3   Airline seat allocation among Multiple Fare Classes with Ove.. (context) - Gosavi, Bandla et al.
3   Ode analysis for Q-learning algorithms (context) - Abounadi, Bertsekas et al. - 1996
3   Optimal preventive maintenance in a production inventory sys.. (context) - Das, Sarkar
2   Optimal control of a queuing network system with two types o.. (context) - Shioyama - 1991
2   the Convergence of Some Reinforcement Learning Algorithms (context) - Gosavi - 2000
2   An analog scheme for fixed point computation (context) - Borkar, Soumyanath - 1997
2   A Simulation-Based Learning Automata Framework for Solving S.. (context) - Gosavi, Das et al.
2   Part selection policy for a flexible manufacturing cell feed.. (context) - Seidmann, Schweitzer - 1984
2   An algorithm for solving semi-markov decision problems using.. (context) - Gosavi - 1999
2   Optimal and near-optimal control of a two-part stochastic ma.. (context) - Elhafsi, Bai - 1997
1   Special Issue of Machine Learning Journal (context) - Sutton - 1992
1   Optimal inspection policies for a manufacturing station (context) - Cassandras, Han - 1992

Documents on the same site (http://www.eng.buffalo.edu/~agosavi/papers.html):   More
Global Supply Chain Management: A Reinforcement.. - Pontrandolfo, Gosavi, .. (2002)   (Correct)
Stochasticsand - Statistics Reinforcement Learning   (Correct)
Boundedness of Iterates in Q-Learning - Abhijit Gosavi Department   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC