Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to describe performance criteria, in the functions used to describe state transitions and observations, and in the relationships among features used to describe states, actions, rewards, and observations.
|
4387
|
Probabilistic Reasoning in Intelligent Systems
– Pearl
- 1988
|
|
2315
|
Graph-based algorithms for boolean function manipulation
– Bryant
- 1986
|
|
1875
|
Artificial Intelligence: A Modern Approach
– Russell, Norvig
- 1995
|
|
1673
|
Reinforcement learning: An introduction
– Sutton, Barto
- 1998
|
|
1397
|
STRIPS: A new approach in the application of theorem proving to problem solving
– Fikes, Nilsson
- 1971
|
|
1396
|
Applied Dynamic Programming
– Bellman, Dreyfus
- 1962
|
|
1224
|
Some philosophical problems from the standpoint of arti cial intelligence
– McCarthy, Hayes
- 1969
|
|
686
|
and Nonlinear Programming
– Luenberger, Linear
- 1984
|
|
609
|
Decision with multiple objectives: Preferences and value tradeoffs,” Cambridge
– Keeney, Raiffa
- 1976
|
|
579
|
Planning for conjunctive goals
– Chapman
- 1987
|
|
400
|
Learning to act using Real-Time Dynamic Programming
– Barto, Bradtke, et al.
- 1995
|
|
389
|
UCPOP: A sound, complete, partial order planner for ADL
– Penberthy, Weld
- 1992
|
|
388
|
Planning in a hierarchy of abstraction spaces
– Sacerdoti
- 1974
|
|
378
|
Systematic nonlinear planning
– McAllester, Rosenblitt
- 1991
|
|
361
|
Markov Decision Processes
– Puterman
- 1994
|
|
353
|
Dynamic Programming and Markov Processes
– Howard
- 1960
|
|
348
|
Learning and executing generalized robot plans
– Fikes, Hart, et al.
- 1972
|
|
324
|
A probabilistic approach to concurrent mapping and localization for mobile robots
– Thrun, Fox, et al.
- 1998
|
|
315
|
Universal plans for reactive robots in unpredictable environments
– Schoppers
- 1987
|
|
312
|
A model for reasoning about persistence and causation
– Dean, Kanazawa
- 1989
|
|
293
|
Real-Time Heuristic Search
– Korf
- 1990
|
|
292
|
Nonlinear Programming
– Bertsekas
- 1995
|
|
291
|
Planning and Control
– Dean, Wellman
- 1991
|
|
275
|
Evaluating influence diagrams
– Shachter
- 1986
|
|
241
|
An algorithm for probabilistic planning
– Kushmerick, Hanks, et al.
- 1995
|
|
238
|
ADL: Exploring the middle ground between STRIPS and the situation calculus
– Pednault
- 1989
|
|
235
|
uence diagrams
– Howard, Matheson
- 1981
|
|
234
|
An Introduction to Least Commitment Planning
– Weld
- 1994
|
|
226
|
Probabilistic robot navigation in partially observable environments
– Simmons, Koenig
- 1995
|
|
221
|
The optimal control of Partially Observable Markov Processe
– Sondik
- 1971
|
|
219
|
State constraints revisited
– Lin, Reiter
- 1994
|
|
210
|
Acting optimally in partially observable stochastic domains
– Cassandra, Kaelbling, et al.
- 1994
|
|
201
|
Conditional nonlinear planning
– Peot, Smith
- 1992
|
|
199
|
The computational complexity of propositional STRIPS planning
– Bylander
- 1994
|
|
195
|
Algebraic decision diagrams and their applications. Formal methods in system design
– Bahar, Frohm, et al.
- 1997
|
|
195
|
Probabilistic planning with information gathering and contingent execution
– Draper, S, et al.
- 1994
|
|
187
|
Dynamic Programming and
– Bertsekas
- 1995
|
|
187
|
The Parti-game Algorithm for Variable Resolution Reinforcement Learning
– Moore
- 1993
|
|
186
|
Exploiting structure in policy construction
– Boutilier, Dearden, et al.
- 1995
|
|
183
|
Bucket elimination: A unifying framework for probabilistic inference
– Dechter
- 1999
|
|
182
|
An approach to planning with incomplete information
– Etzioni, Hanks, et al.
- 1992
|
|
176
|
Context-specific independence in bayesian networks
– Boutilier, Friedman, et al.
- 1996
|
|
168
|
Algorithm 97 (shortest path
– Floyd
- 1962
|
|
167
|
The nonlinear nature of plans
– Sacerdoti
- 1975
|
|
158
|
Reinforcement learning with hierarchies of machines
– Parr, Russell
- 1998
|
|
152
|
The optimal control of partially observed markov processes over the finite horizon
– Smallwood, Sondik
- 1973
|
|
149
|
TD-Gammon, a self-teaching backgammon program, achieves master-level play
– Tesauro
- 1994
|
|
145
|
Planning under time constraints in stochastic domains
– Dean, Kaelbling, et al.
- 1995
|
|
143
|
A survey of algorithmic methods for partially observed Markov decision processes”, Annals of Operations Research
– Lovejoy
- 1991
|
|
133
|
Planning with deadlines in stochastic domains
– Dean, Kaelbling, et al.
- 1993
|