MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Architectures for Policy Search Proposal for PhD thesis

Download:
Download as a PDF | Download as a PS
by Leonid M. Peshkin
http://www.ai.mit.edu/~pesha/Public/proposal.ps
Add To MetaCart

Abstract:

The key idea in this research work is that challenging planning and control problems in stochastic domains can be solved using a general optimization technique combined with carefully constructed representations. An objective of articial intelligence (AI) is to model the behavior of an intelligent agent's interaction with its environment. Traditionally this interaction is captured by Markov decision processes (MDPs). MDPs are typically solved by value-search methods yielding memoryless controllers. However, the an agent's interactions with real world are better modeled by partially observable MDPs (POMDPs). In POMDPs, value is an ill-dened concept, and controllers with memory are necessary for optimal behavior. Methods that search directly in the space of policies provide a viable alternative to value-based methods. The performance of policy-search methods crucially depends on the architecture on the controller. We present and analyze various architectures for controllers in POMDPs. We propose the future work which will provide useful insights into building controllers able to take

Citations

2103 A tutorial in hidden Markov models and selected applications in speech recognition – Rabiner - 1989
1673 Reinforcement learning: An introduction – Sutton, Barto - 1998
938 Learning from Delayed Rewards – Watkins - 1989
887 Reinforcement learning: A survey – Kaelbling, Littman, et al. - 1996
672 A Course in Game Theory – Osborne, Rubinstein - 1994
501 Pengi: an implementation of a theory of activity – Agre, Chapman - 1987
408 Planning and acting in partially observable stochastic domains – Kaelbling, Littman, et al. - 1998
377 Neuronlike adaptive elements that can solve difficult learning control problems – Barto, Sutton, et al. - 1983
361 Markov Decision Processes – Puterman - 1994
353 Dynamic Programming and Markov Processes – Howard - 1960
323 Markov games as a framework for multi-agent reinforcement learning – Littman - 1994
221 The optimal control of Partially Observable Markov Processe – Sondik - 1971
213 Simulation and the Monte Carlo Method – Rubinstein - 1981
210 Acting optimally in partially observable stochastic domains – Cassandra, Kaelbling, et al. - 1994
200 Dynamic Programming and Optimal Control. Athena Sci – Bertsekas - 2005
191 The dynamics of reinforcement learning in cooperative multiagent systems – Claus, Boutilier - 1998
190 Multiagent reinforcement learning: Theoretical framework and an algorithm – Hu, Wellman - 1998
174 Simple statistical gradient-following algorithms for connectionist reinforcement learning – Williams - 1992
171 Reinforcement Learning with Selective Perception and Hidden State – McCallum - 1995
157 Policy gradient methods for reinforcement learning with function approximation – Sutton, McAllester, et al. - 1999
137 From local actions to global tasks: Stigmergy and collective robotics – Beckers, Holland, et al. - 1994
122 I.: Reinforcement Learning Algorithm for Partially Observable – Jaakkola, Singh, et al. - 1994
105 Asynchronous stochastic approximation and Q-learning – Tsitsiklis - 1994
102 Gradient descent for general reinforcement learning – Baird, C - 1998
95 The complexity of decentralized control of markov decision processes – Bernstein, Givan, et al. - 2002
93 Learning without stateestimation in partially observable Markovian decision problems – Singh, Jaakkola, et al. - 1994
90 Sequential optimality and coordination in multiagent systems – Boutilier - 1999
86 High-level Vision – Ullman - 1996
84 The RoboCup Synthetic Agent Challenge, 97 – Kitano, Tambe, et al. - 1997
79 Exact and Approximate Algorithms for Partially Observable Markov Decision Processes – Cassandra - 1998
78 Actor-critic algorithms – Konda, Tsitsiklis - 2000
78 Memoryless policies: theoretical limitations and practical results – Littman - 1994
69 Learning to perceive and act by trial and error – Whitehead, Ballard - 1991
65 Solving POMDPs by searching in policy space – Hansen - 1998
56 Continual Learning in Reinforcement Environments – Ring - 1994
53 Reinforcement learning in POMDP’s via direct gradient ascent – Baxter, Bartlett
47 Learning policies with external memory – Peshkin, Meuleau, et al. - 1999
46 Simulation-based optimization of Markov reward processes – Marbach, Tsitsiklis - 1998
38 Intra-option learning about temporally abstract actions – Sutton, Precup, et al. - 1998
35 Planning and control in stochastic domains with imperfect information – Hauskrecht - 1997
32 Distributed value functions – Schneider, Wong, et al. - 1999
29 Finite-Memory Control of Partially Observable Systems – Hansen - 1998
26 Model-free reinforcement learning for non-markovian decision problems – Singh, Jaakkola, et al. - 1994
20 Simulation-based methods for Markov decision processes. Doctoral dissertation – Marbach - 1998
20 Learning Automata – Narendra, Thathachar - 1989
20 Reinforcement learning in connectionist networks: A mathematical analysis – Williams - 1986
18 A class of gradient-estimating algorithms for reinforcement learning in neural networks – Williams - 1987
17 Csaba Szepesvari. Convergence results for single-step on-policy reinforcementlearning algorithms – Singh, Jaakkola, et al. - 2000
15 Reinforcement learning through gradient descent – Baird - 1999
14 Learning controllers for partially observable environments – Meuleau, Peshkin, et al. - 1999