I hereby declare that I am the sole author of the thesis.
|
1397
|
Dynamic Programming
– Bellman
- 1957
|
|
1397
|
STRIPS: A new approach in the application of theorem proving to problem solving
– Fikes, Nilsson
- 1971
|
|
1267
|
Data Networks
– Bertsekas, Gallager
- 1992
|
|
938
|
Learning from Delayed Rewards
– Watkins
- 1989
|
|
885
|
Learning to Predict by the Methods of Temporal Differences
– Sutton
- 1988
|
|
408
|
Planning and acting in partially observable stochastic domains
– Kaelbling, Littman, et al.
- 1998
|
|
400
|
Learning to act using Real-Time Dynamic Programming
– Barto, Bradtke, et al.
- 1995
|
|
389
|
UCPOP: A sound, complete, partial order planner for ADL
– Penberthy, Weld
- 1992
|
|
378
|
Systematic nonlinear planning
– McAllester, Rosenblitt
- 1991
|
|
374
|
Integrated Architecture for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
– Sutton
- 1990
|
|
361
|
Markov Decision Processes
– Puterman
- 1994
|
|
314
|
Probabilistic logic
– Nilsson
|
|
293
|
Real-time heuristic search
– Korf
- 1990
|
|
246
|
Decisiontheoretic planning: Structural assumptions and computationalleverage
– Boutilier, Dean, et al.
- 1999
|
|
241
|
An algorithm for probabilistic planning
– Kushmerick, Hanks, et al.
- 1995
|
|
239
|
Prioritized sweeping: Reinforcement learning with less data and less real time
– Moore, Atkeson
- 1993
|
|
215
|
Improving elevator performance using reinforcement learning
– Crites, Barto
- 1996
|
|
212
|
Temporal Credit Assignment in Reinforcement Learning
– Sutton
- 1984
|
|
175
|
Learning Policies for Partially Observable Environments: Scaling Up
– Littman, Cassandra, et al.
- 1995
|
|
162
|
On the convergence of stochastic iterative dynamic programming algorithms
– Jaakkola, Jordan, et al.
- 1994
|
|
158
|
Reinforcement learning with hierarchies of machines
– Parr, Russell
- 1998
|
|
156
|
On-line Q-learning using connectionist systems
– Rummery, Niranjan
- 1994
|
|
152
|
The optimal control of partially observed markov processes over the finite horizon
– Smallwood, Sondik
- 1973
|
|
145
|
Planning with Incomplete Information as Heuristic Se arch
– Bonet, Geffner
|
|
145
|
Planning under time constraints in stochastic domains
– Dean, Kaelbling, et al.
- 1995
|
|
133
|
Planning with deadlines in stochastic domains
– Dean, Kaelbling, et al.
- 1993
|
|
132
|
A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
– Monahan
- 1982
|
|
124
|
Macro operators: A weak method for learning
– Korf
- 1985
|
|
122
|
I.: Reinforcement Learning Algorithm for Partially Observable
– Jaakkola, Singh, et al.
- 1994
|
|
118
|
Approximating optimal policies for partially observable stochastic domains. (unpublished manuscript
– Parr, Russell
- 1995
|
|
109
|
Anytime synthetic projection: Maximizing the probability of goal satisfaction
– Drummond, Bresina
- 1990
|
|
94
|
Decomposition techniques for planning in stochastic domains
– Dean, Lin
- 1995
|
|
93
|
Learning without stateestimation in partially observable Markovian decision problems
– Singh, Jaakkola, et al.
- 1994
|
|
90
|
Optimization Theory for Large Systems
– Lasdon
- 1970
|
|
89
|
Hierarchical solution of Markov decision processes using macro-actions
– Hauskrecht, Meuleau, et al.
- 1998
|
|
88
|
The complexity of stochastic games
– Condon
- 1992
|
|
86
|
Utility models for goaldirected, decision-theoretic planners
– Haddawy, Hanks
- 1998
|
|
84
|
Model minimization in Markov decision processes
– Dean, Givan
- 1997
|
|
80
|
Finite State Markovian Decision Processes
– Derman
- 1970
|
|
80
|
Efficient Learning and Planning Within the Dyna Framework
– Peng, Williams
- 1993
|
|
80
|
Finding structure in reinforcement learning
– Thrun, Schwartz
- 1995
|
|
79
|
Exact and Approximate Algorithms for Partially Observable Markov Decision Processes
– Cassandra
- 1998
|
|
78
|
Optimal control of Markov decision processes with incomplete stateestimation
– Astrom
- 1965
|
|
78
|
Computing optimal policies for partially observable decision processes using compact representations
– Boutilier, Poole
- 1996
|
|
78
|
Value-function approximations for Partially Observable Markov Decision Processes
– Hauskrecht
- 2000
|
|
72
|
Stochastic dynamic programming with factored representations
– Boutilier, Dearden, et al.
|
|
68
|
Approximate planning in large POMDPs via reusable trajectories
– Kearns, Mansour, et al.
- 1999
|
|
68
|
A robot navigation architecture based on partially observable markov decision process models
– Koenig, Simmons
|
|
65
|
Decision Theory in Expert Systems and Artificial Intelligence
– Horvitz, Breese, et al.
- 1988
|
|
64
|
The convergence of TD(*) for general
– Dayan
- 1992
|