See this document in CiteSeerX!

Part 1: POMDPs  (Make Corrections)  
POMDPs... Markov decision process (MDP)= (S, A, P, R) states sS, actions aA,...



  Home/Search   Context   Related

 
View or download:
marctoussaint.net...eminarPOMDP.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  marctoussaint.net/public...index (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: use # as belief state (Kaelbling, Littman, & Cassandra 1998; Kaelbling, Cassandra, & Kurien 1996) unknown, part.obs. use observations O reinforcement learning in POMDPs (doesn't really work :-) (Jaakkola, Singh, & Jordan 1995; Singh, Jaakkola, & Jordan 1994) history window of obs. O recurrent Q-learning, RNNs no representation find optimal policy graphs directly (Meuleau, Kim, Kaelbling, & Cassandra 1999) symbolic AI, rule extraction adaptive? later... optimal behavior: DP + MC + RL... (Update)

Active bibliography (related documents):   More   All
0.2:   Hierarchical Reinforcement Learning Based on Subgoal.. - Bakker, Schmidhuber (2004)   (Correct)
0.2:   Evolutionary Computation versus Reinforcement Learning - Schmidhuber (2000)   (Correct)
0.2:   Synthesis of Recursive Programs from Finite Examples By.. - Wysotzki, Schmid (2001)   (Correct)

Similar documents based on text:
0.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

@misc{ decision-part,
  author = "Pomdps Markov Decision",
  title = "Part 1: POMDPs",
  url = "citeseer.ist.psu.edu/741786.html" }
Citations (may not include all citations):
374   Reinforcement learning: A survey - Kaelbling, Littman et al. - 1996
187   Planning and acting in partially observable stochastic domai.. - Kaelbling, Littman et al. - 1998
136   Acting under uncertainty: Discrete bayesian models for mobil.. - Kaelbling, Cassandra et al. - 1996
81   Reinforcement learning algorithm for partially observable Ma.. - Jaakkola, Singh et al. - 1995
77   Between MDPs and semi-MDPs: A framework for temporal abstrac.. - Sutton, Precup et al. - 1999
77   Learning topological maps with weak local odometric informat.. - Shatkay, Kaelbling - 1997
56   Learning without state estimation in partially observable en.. - Singh, Jaakkola et al. - 1994
24   A self-organizing representation of sensor space for mobile .. - Krose, Eecen - 1994
15   Solving POMDPs by searching the space of finite policies - Meuleau, Kim et al. - 1999
12   Automatic discovery of subgoals in reinforcement learning us.. - McGovern, Barto - 2001
5   Self-segmentation of sequences: Automatic formation of hiera.. - Sun, Sessions - 2000

Documents on the same site (http://www.marc-toussaint.net/publications/index.html):   More
Factorial Representations to Generate Arbitrary Search.. - Toussaint   (Correct)
The Evolution Of Genetic Representations And Modular Adaptation - Toussaint (2003)   (Correct)
Lectures on Reduce and Maple at UAM-I, Mexico - Toussaint (1999)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC