We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn't specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1
|
887
|
Reinforcement learning: A survey
– Kaelbling, Littman, et al.
- 1996
|
|
562
|
The Esterel synchronous programming language: Design, semantics, implementation
– Berry, Gonthier
- 1992
|
|
214
|
Hierarchical reinforcement learning with the MAXQ value function decomposition
– Dietterich
|
|
191
|
The dynamics of reinforcement learning cooperative multiagent systems
– Claus, Boutilier
- 1998
|
|
188
|
Between MDPs and semi-MDPs: A Framework for Temporal Abstraction
– Sutton, Precup, et al.
|
|
175
|
Reward, motivation and reinforcement learning
– Dayan, Balleine
|
|
159
|
Teleo-reactive programs for agent control
– Nilsson
- 1994
|
|
158
|
Reinforcement learning with hierarchies of machines
– Parr, Russell
- 1998
|
|
156
|
On-line Q-learning using connectionist systems
– Rummery, Niranjan
- 1994
|
|
105
|
Asynchronous stochastic approximation and Q-learning
– Tsitsiklis
- 1994
|
|
91
|
A multivalued logic approach to integrating planning and control
– Saffiotti, Konolige, et al.
- 1995
|
|
77
|
Policy invariance under reward transformations: Theory and applications to reward shaping
– Ng, Harada, et al.
- 1999
|
|
73
|
Achieving artificial intelligence through building robots
– Brooks
- 1986
|
|
71
|
Graphical models for preference and utility
– Bacchus, Grove
- 1995
|
|
64
|
Computing factored value functions for policies in structured MDPs
– Koller, Parr
- 1999
|
|
50
|
State abstraction for programmable reinforcement learning agents
– Andre, Russell
- 2002
|
|
48
|
Scaling Up Reinforcement Learning for Robot Control
– Lin
- 1993
|
|
47
|
Learning policies with external memory
– Peshkin, Meuleau, et al.
- 1999
|
|
46
|
Algorithms for inverse reinforcement learning
– Ng, Russell
- 2000
|
|
45
|
UCP-Networks: A Directed Graphical Representation of Conditional Utilities
– Boutilier, Bacchus, et al.
- 2001
|
|
43
|
opaque-transition reinforcement learning
– Stone, Veloso
- 1998
|
|
33
|
Coordinated reinforcement learning
– Guestrin, Lagoudakis, et al.
- 2002
|
|
28
|
Reacting, planning and learning in an autonomous agent
– Benson, Nilsson
- 1995
|
|
21
|
Modularity issues in reactive planning
– Firby
- 1996
|
|
18
|
Multiple objective behavior-based control
– Pirjanian
- 2000
|
|
17
|
State Abstraction in MAXQ Hierarchical Reinforcement Learning
– Dietterich
- 2000
|
|
12
|
Stock and recruitment
– Ricker
- 1954
|
|
9
|
Optimal selection of uncertain actions by maximizing expected utility
– Rosenblatt
- 2000
|
|
5
|
Temporal abstraction in reinforcement learning
– Sutton
- 1995
|
|
5
|
RALPH-MEA: A Real-Time, Decision-Theoretic Agent Architecture
– Ogasawara
- 1993
|
|
2
|
Hierarchical Control and Learning for MDPs
– Parr
- 1998
|
|
1
|
Programmable HAMs. www.cs.berkeley.edu/dandre/pham.ps
– Andre
- 2000
|
|
1
|
State abstraction in MAXQ hierarchical RL
– Dietterich
- 2000
|
|
1
|
Programmable hams. tech report: www.davidandre.com/pham.ps
– Andre
- 2000
|
|
1
|
Neuro-dynamic programming. Belmont,MA: Athena Scientific
– Bertsekas, Tsitsiklis
- 1996
|