Results 1  10
of
90
Perseus: Randomized pointbased value iteration for POMDPs
 Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract

Cited by 202 (16 self)
 Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized pointbased value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other pointbased methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
Online planning algorithms for POMDPs
 Journal of Artificial Intelligence Research
, 2008
"... Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decisionmaking under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate ..."
Abstract

Cited by 109 (3 self)
 Add to MetaCart
(Show Context)
Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decisionmaking under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that stateoftheart online heuristic search methods can handle large POMDP domains efficiently. 1.
Optimizing FixedSize Stochastic Controllers for POMDPs and Decentralized POMDPs
"... POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements ..."
Abstract

Cited by 30 (15 self)
 Add to MetaCart
POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements of current algorithms is based on representing agent policies as finitestate controllers. In this paper, we propose a new approach that uses this representation and formulates the problem as a nonlinear program (NLP). The NLP defines an optimal policy of a desired size for each agent. This new representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DECPOMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an offtheshelf optimization method are competitive with stateoftheart POMDP algorithms and outperform stateoftheart DECPOMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DECPOMDPs using nonlinear programming methods. 1.
AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs
 In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI
, 2007
"... Solving large Partially Observable Markov Decision Processes (POMDPs) is a complex task which is often intractable. A lot of effort has been made to develop approximate offline algorithms to solve ever larger POMDPs. However, even stateoftheart approaches fail to solve large POMDPs in reasonable t ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
Solving large Partially Observable Markov Decision Processes (POMDPs) is a complex task which is often intractable. A lot of effort has been made to develop approximate offline algorithms to solve ever larger POMDPs. However, even stateoftheart approaches fail to solve large POMDPs in reasonable time. Recent developments in online POMDP search suggest that combining offline computations with online computations is often more efficient and can also considerably reduce the error made by approximate policies computed offline. In the same vein, we propose a new anytime online search algorithm which seeks to minimize, as efficiently as possible, the error made by an approximate value function computed offline. In addition, we show how previous online computations can be reused in following time steps in order to prevent redundant computations. Our preliminary results indicate that our approach is able to tackle large state space and observation space efficiently and under realtime constraints. 1
Relational Dynamic Influence Diagram Language (RDDL): Language Description
"... The Relational Dynamic Influence Diagram Language (RDDL) is a uniform language where states, actions, and observations (whether discrete or continuous) are parameterized variables and the evolution of a fully or partially observed (stochastic) process is specified via (stochastic) functions over nex ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
The Relational Dynamic Influence Diagram Language (RDDL) is a uniform language where states, actions, and observations (whether discrete or continuous) are parameterized variables and the evolution of a fully or partially observed (stochastic) process is specified via (stochastic) functions over next state variables conditioned on current state and action variables (n.b., concurrency is allowed). Parameterized variables are simply templates for ground variables that can be obtained when given a particular problem instance defining possible domain objects. Semantically, RDDL is simply a dynamic Bayes net (DBN) [1] (with potentially many intermediate layers) extended with a simple influence diagram (ID) [2] utility node representing immediate reward. An objective function specifies how these immediate rewards should be optimized over time for optimal control. For a ground instance, RDDL is just a factored MDP (or POMDP, if partially observed).
ModelFree Reinforcement Learning as Mixture Learning
"... We cast modelfree reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is eq ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
We cast modelfree reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is equivalent to a nonbootstrapping optimistic policy iteration algorithm like Sarsa(1) that can be applied both in MDPs and POMDPs. On the theoretical side, by relating the proposed stochastic EM algorithm to the family of optimistic policy iteration algorithms, we provide new tools that permit the design and analysis of algorithms in that family. On the practical side, preliminary experiments on a POMDP problem demonstrated encouraging results. 1.
Efficient Planning under Uncertainty with Macroactions
, 2014
"... Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. Detailed Terms The MIT Faculty has made this article openly available. Please share how this access benefits you. ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. Detailed Terms The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.
POMDP planning for robust robot control
 in: The Twelveth International Symposium on Robotics Research
, 2005
"... POMDPs provide a rich framework for planning and control in partially observable domains. Recent new algorithms have greatly improved the scalability of POMDPs, to the point where they can be used in robot applications. In this paper, we describe how approximate POMDP solving can be further improved ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
POMDPs provide a rich framework for planning and control in partially observable domains. Recent new algorithms have greatly improved the scalability of POMDPs, to the point where they can be used in robot applications. In this paper, we describe how approximate POMDP solving can be further improved by the use of a new theoreticallymotivated algorithm for selecting salient information states. We present the algorithm, called PEMA, demonstrate competitive performance on a range of navigation tasks, and show how this approach is robust to mismatches between the robot’s physical environment and the model used for planning. 1
Symbolic heuristic search value iteration for factored POMDPs
 In Proceedings of the 23rd national conference on Artificial intelligence (AAAI08
, 2008
"... We propose Symbolic heuristic search value iteration (Symbolic HSVI) algorithm, which extends the heuristic search value iteration (HSVI) algorithm in order to handle factored partially observable Markov decision processes (factored POMDPs). The idea is to use algebraic decision diagrams (ADDs) fo ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
We propose Symbolic heuristic search value iteration (Symbolic HSVI) algorithm, which extends the heuristic search value iteration (HSVI) algorithm in order to handle factored partially observable Markov decision processes (factored POMDPs). The idea is to use algebraic decision diagrams (ADDs) for compactly representing the problem itself and all the relevant intermediate computation results in the algorithm. We leverage Symbolic Perseus for computing the lower bound of the optimal value function using ADD operators, and provide a novel ADDbased procedure for computing the upper bound. Experiments on a number of standard factored POMDP problems show that we can achieve an order of magnitude improvement in performance over previously proposed algorithms. Partially observable Markov decision processes (POMDPs) are widely used for modeling stochastic sequential decision problems with noisy observations. However, when we model realworld problems, we often need a compact representation since the model may result in a very large number of states when using the flat, tablebased representation. Factored POMDPs (Boutilier & Poole 1996) are one type of such compact representation. Given a factored POMDP, we have to design an algorithm that does not explicitly enumerate all the states in the model. For this purpose, it has gained popularity over the years to use the algebraic decision diagram (ADD) representation (Bahar et al. 1993) of all the vectors and matrices used
Pointbased policy iteration
 in Proceedings of AAAI
, 2007
"... We describe a pointbased policy iteration (PBPI) algorithm for infinitehorizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with pointbased value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonic: At each iteration before converg ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
We describe a pointbased policy iteration (PBPI) algorithm for infinitehorizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with pointbased value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonic: At each iteration before convergence, PBPI produces a policy for which the values increase for at least one of a finite set of initial belief states, and decrease for none of these states. In contrast, PBVI cannot guarantee monotonic improvement of the value function or the policy. In practice PBPI generally needs a lower density of point coverage in the simplex and tends to produce superior policies with less computation. Experiments on several benchmark problems (up to 12,545 states) demonstrate the scalability and robustness of the PBPI algorithm.