MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Programmable reinforcement learning agents (2001) [56 citations — 1 self]

Download:
pdf | ps
by David Andre, Stuart J. Russell
http://www.cs.berkeley.edu/~dandre/papers/pham_nips_final.ps
Add To MetaCart

Abstract:

We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn't specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1

Citations

887 Reinforcement learning: A survey – Kaelbling, Littman, et al. - 1996
562 The Esterel synchronous programming language: Design, semantics, implementation – Berry, Gonthier - 1992
214 Hierarchical reinforcement learning with the MAXQ value function decomposition – Dietterich
191 The dynamics of reinforcement learning cooperative multiagent systems – Claus, Boutilier - 1998
188 Between MDPs and semi-MDPs: A Framework for Temporal Abstraction – Sutton, Precup, et al.
175 Reward, motivation and reinforcement learning – Dayan, Balleine
159 Teleo-reactive programs for agent control – Nilsson - 1994
158 Reinforcement learning with hierarchies of machines – Parr, Russell - 1998
156 On-line Q-learning using connectionist systems – Rummery, Niranjan - 1994
105 Asynchronous stochastic approximation and Q-learning – Tsitsiklis - 1994
91 A multivalued logic approach to integrating planning and control – Saffiotti, Konolige, et al. - 1995
77 Policy invariance under reward transformations: Theory and applications to reward shaping – Ng, Harada, et al. - 1999
73 Achieving artificial intelligence through building robots – Brooks - 1986
71 Graphical models for preference and utility – Bacchus, Grove - 1995
64 Computing factored value functions for policies in structured MDPs – Koller, Parr - 1999
50 State abstraction for programmable reinforcement learning agents – Andre, Russell - 2002
48 Scaling Up Reinforcement Learning for Robot Control – Lin - 1993
47 Learning policies with external memory – Peshkin, Meuleau, et al. - 1999
46 Algorithms for inverse reinforcement learning – Ng, Russell - 2000
45 UCP-Networks: A Directed Graphical Representation of Conditional Utilities – Boutilier, Bacchus, et al. - 2001
43 opaque-transition reinforcement learning – Stone, Veloso - 1998
33 Coordinated reinforcement learning – Guestrin, Lagoudakis, et al. - 2002
28 Reacting, planning and learning in an autonomous agent – Benson, Nilsson - 1995
21 Modularity issues in reactive planning – Firby - 1996
18 Multiple objective behavior-based control – Pirjanian - 2000
17 State Abstraction in MAXQ Hierarchical Reinforcement Learning – Dietterich - 2000
12 Stock and recruitment – Ricker - 1954
9 Optimal selection of uncertain actions by maximizing expected utility – Rosenblatt - 2000
5 Temporal abstraction in reinforcement learning – Sutton - 1995
5 RALPH-MEA: A Real-Time, Decision-Theoretic Agent Architecture – Ogasawara - 1993
2 Hierarchical Control and Learning for MDPs – Parr - 1998
1 Programmable HAMs. www.cs.berkeley.edu/dandre/pham.ps – Andre - 2000
1 State abstraction in MAXQ hierarchical RL – Dietterich - 2000
1 Programmable hams. tech report: www.davidandre.com/pham.ps – Andre - 2000
1 Neuro-dynamic programming. Belmont,MA: Athena Scientific – Bertsekas, Tsitsiklis - 1996