by Relu Patrascu, Pascal Poupart
In Proceedings of the 18th National Conference on Artificial Intelligence
http://www.cs.uwaterloo.ca/~ppoupart/publications/basislp/paper.ps.gz
Add To MetaCart
Abstract:
Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very large MDPs as a result. However, a number of issues remain unresolved: How accurate are the approximations produced by linear programs? How hard is it to produce better approximations? and Where do the basis functions come from? To address these questions, we first investigate the complexity of minimizing the Bellman error of a linear value function approximation—showing that this is an inherently hard problem. Nevertheless, we provide a branch and bound method for calculating Bellman error and performing approximate policy iteration for general factored MDPs. These methods are more accurate than linear programming, but more expensive. We then consider linear programming itself and investigate methods for automatically constructing sets of basis functions that allow this approach to produce good approximations. The techniques we develop are guaranteed to reduce Äerror, but can also empirically reduce Bellman error. 1
Citations
|
246
|
Decisiontheoretic planning: Structural assumptions and computationalleverage
– Boutilier, Dean, et al.
- 1999
|
|
214
|
Hierarchical reinforcement learning with the MAXQ value function decomposition
– Dietterich
|
|
81
|
Multiagent planning with factored MDPs
– Guestrin, Koller, et al.
- 2001
|
|
72
|
Stochastic dynamic programming with factored representations
– Boutilier, Dearden, et al.
- 2000
|
|
64
|
Computing factored value functions for policies in structured MDPs
– Koller, Parr
- 1999
|
|
55
|
The linear programming approach to approximate dynamic programming
– Farias, Roy
- 2003
|
|
54
|
Policy iteration for factored MDPs
– Koller, Parr
- 2000
|
|
53
|
2001, ‘Max-norm projections for factored MDPs
– Guestrin, Koller, et al.
|
|
47
|
Generalized polynomial approximations in Markovian decision processes
– Schweitzer, Seidmann
- 1985
|
|
31
|
Markov decision processes: Discrete dynamic programming
– Puterman
- 1994
|
|
29
|
Complexity of finite-horizon Markov decision processes
– Mundhenk, Goldsmith, et al.
- 2000
|
|
22
|
Direct value-approximation for factored MDPs
– Schuurmans, Patrascu
- 2001
|
|
21
|
Dynamic Programming and Optimal Control, volume 2. Athena Scientific
– Bertsekas
- 1995
|
|
21
|
Nonapproximability results for partially observable markov decision processes
– Lusena, Goldsmith, et al.
- 2001
|
|
11
|
Spline approximations to value functions: A linear programming approach
– Trick, Zin
- 1997
|
|
8
|
Using free energies to represent Q-values in a multiagent reinforcement learning task
– Sallans, Hinton
- 2001
|
|
4
|
Nonlinear Optimization. Athena Scientific
– Bertsekas
- 1995
|