Results 1  10
of
186
Planning and acting in partially observable stochastic domains
 ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract

Cited by 1095 (38 self)
 Add to MetaCart
(Show Context)
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finitememory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
Valuefunction approximations for partially observable Markov decision processes
 Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract

Cited by 167 (1 self)
 Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
Learning finitestate controllers for partially observable environments
 In Proceedings of the fifteenth conference on uncertainty in artificial intelligence
, 1999
"... Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finitestate automata. In this paper, we extend B ..."
Abstract

Cited by 93 (10 self)
 Add to MetaCart
Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finitestate automata. In this paper, we extend Baird and Moore’s VAPS algorithm to the problem of learning general finitestate automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finitestate controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each timestep. 1
On the Undecidability of Probabilistic Planning and Related Stochastic Optimization Problems
 Artificial Intelligence
, 2003
"... Automated planning, the problem of how an agent achieves a goal given a repertoire of actions, is one of the foundational and most widely studied problems in the AI literature. The original formulation of the problem makes strong assumptions regarding the agent's knowledge and control over the ..."
Abstract

Cited by 74 (0 self)
 Add to MetaCart
Automated planning, the problem of how an agent achieves a goal given a repertoire of actions, is one of the foundational and most widely studied problems in the AI literature. The original formulation of the problem makes strong assumptions regarding the agent's knowledge and control over the world, namely that its information is complete and correct, and that the results of its actions are deterministic and known.
Solving POMDPs by searching the space of finite policies
 IN PROCEEDINGS OF THE FIFTEENTH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1999
"... Solving partially observable Markov decision processes (POMDPs) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from a restricted set of policies, represented as finite state au ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
(Show Context)
Solving partially observable Markov decision processes (POMDPs) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from a restricted set of policies, represented as finite state automata of a given size. This problem is also intractable, but we show that the complexity can be greatly reduced when the POMDP and/or policy are further constrained. We demonstrate good empirical results with a branchandbound method for finding globally optimal deterministic policies, and a gradientascent method for finding locally optimal stochastic policies.
Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes
 Journal of Artificial Intelligence Research
, 2001
"... Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a wellknown algorithm for finding optimal policies for POMDPs. It typically takes a large number ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a wellknown algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems. 1. Introduction POMDPs model sequential decision making problems where effects of actions are nondeterministic and the state of the world is not known with certainty. They have attracted many researchers in Operations Research and Artificial Intelligence because of their potential applications in a wide range of areas (Monahan 1982, Cassandra 1998b), one of which is planning under uncertai...
Nonapproximability Results for Partially Observable Markov Decision Processes
, 2000
"... We show that for several variations of partially observable Markov decision processes, polynomialtime algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
(Show Context)
We show that for several variations of partially observable Markov decision processes, polynomialtime algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" means \unless some complexity classes collapse," where the collapses considered are P = NP, P = PSPACE, or P = EXP. Until or unless these collapses are shown to hold, any controlpolicy designer must choose between such performance guarantees and ecient computation.
A Survey of POMDP Applications
"... An increasing number of researchers in many areas are becoming interested in the application of the partially observable Markov decision process (POMDP) model to problems with hidden state. This model can account for both state transition and observation uncertainty. The majority of recent research ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
An increasing number of researchers in many areas are becoming interested in the application of the partially observable Markov decision process (POMDP) model to problems with hidden state. This model can account for both state transition and observation uncertainty. The majority of recent research interest in the pomdp model has been in the artificial intelligence community and as such, has been applied in a limited range of domains. The main purpose of this paper is show the wider applicability of the model by way of surveying the potential application areas for POMDPs.
An Analysis of Reinforcement Learning with Function Approximation
"... We address the problem of computing the optimal Qfunction in Markov decision problems with infinite statespace. We analyze the convergence properties of several variations of Qlearning when combined with function approximation, extending the analysis of TDlearning in (Tsitsiklis & Van Roy, 1 ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
(Show Context)
We address the problem of computing the optimal Qfunction in Markov decision problems with infinite statespace. We analyze the convergence properties of several variations of Qlearning when combined with function approximation, extending the analysis of TDlearning in (Tsitsiklis & Van Roy, 1996a) to stochastic control settings. We identify conditions under which such approximate methods converge with probability 1. We conclude with a brief discussion on the general applicability of our results and compare them with several related works. 1.