Download:
by Nevin L. Zhang, Weihong Zhang
Journal of Artificial Intelligence Research
http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume14/zhang01a.pdf
Add To MetaCart
Abstract:
Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for nding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very e ective: It enabled value iteration to converge after only a few iterations on all the test problems. 1.
Citations
|
361
|
Markov Decision Processes
– Puterman
- 1994
|
|
221
|
The optimal control of Partially Observable Markov Processe
– Sondik
- 1971
|
|
189
|
The complexity of Markov decision processes
– Papadimitriou, Tsitsiklis
- 1987
|
|
175
|
Learning Policies for Partially Observable Environments: Scaling Up
– Littman, Cassandra, et al.
- 1995
|
|
132
|
A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
– Monahan
- 1982
|
|
131
|
Algorithms for Sequential Decision Making
– Littman
- 1996
|
|
118
|
Approximating optimal policies for partially observable stochastic domains. (unpublished manuscript
– Parr, Russell
- 1995
|
|
112
|
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. UAI-97
– Cassandra, Littman, et al.
- 1997
|
|
79
|
Exact and Approximate Algorithms for Partially Observable Markov Decision Processes
– Cassandra
- 1998
|
|
78
|
Optimal control of Markov decision processes with incomplete stateestimation
– Astrom
- 1965
|
|
65
|
Solving POMDPs by searching in policy space
– Hansen
- 1998
|
|
61
|
Algorithms for partially observable Markov decision processes
– Cheng
- 1988
|
|
58
|
Computationally feasible bounds for partially observed Markov decision processes
– Lovejoy
- 1989
|
|
35
|
Planning and control in stochastic domains with imperfect information
– Hauskrecht
- 1997
|
|
32
|
Incremental methods for computing bounds in partially observable Markov decision processes
– Hauskrecht
- 1997
|
|
26
|
Efficient dynamic-programming updates in partially observable markov decision processes
– Littman, Cassandra, et al.
- 1995
|
|
21
|
The optimal search for a moving target when the search path is constrained
– Eagle
- 1984
|
|
17
|
Solution procedures for partially observed Markov decision processes
– White, Scherer
- 1989
|
|
12
|
A survey of POMDP applications
– Cassandra
- 1998
|
|
12
|
A method for speeding up value iteration in partially observable markov decision processes
– Zhang, Lee, et al.
- 1999
|
|
10
|
Optimal control of partially observable processes over the finite horizon
– Smallwood, Sondik
- 1973
|
|
8
|
Value function approximations for partially observable Markov decision processes
– Hauskrecht
- 2000
|
|
7
|
Suboptimal policies with bounds for parameter adaptive decision processes
– Lovejoy
- 1993
|
|
6
|
E cient dynamic-programming updates in partially observable Markov decision processes
– Littman, Cassandra, et al.
- 1995
|
|
3
|
A heuristic variable grid solution for POMDPs
– Brafman
- 1997
|
|
2
|
Planning and Acting in Partially Observable Stochastic Domains
– Cassandra
- 1998
|
|
2
|
Optimal in nite-horizon undiscounted control of nite probabilistic systems
– Platzman
- 1980
|
|
1
|
Optimal in#nite-horizon undiscounted control of #nite probabilistic systems
– Platzman
|
|
1
|
Speeding Up Value Iteration in POMDPS
– Zhang, Liu
- 1997
|