• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Exploiting structure to efficiently solve large scale partially observable markov decision processes. Unpublished doctoral dissertation, Citeseer (2005)

by P Poupart
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 39
Next 10 →

Perseus: Randomized point-based value iteration for POMDPs

by Matthijs T. J. Spaan, Nikos Vlassis - Journal of Artificial Intelligence Research , 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract - Cited by 111 (8 self) - Add to MetaCart
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.

Online planning algorithms for POMDPs

by Stéphane Ross, Joelle Pineau, Sébastien Paquet, Brahim Chaib-draa - Journal of Artificial Intelligence Research , 2008
"... Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate ..."
Abstract - Cited by 42 (0 self) - Add to MetaCart
Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that state-of-the-art online heuristic search methods can handle large POMDP domains efficiently. 1.

Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs

by Christopher Amato, Daniel S. Bernstein, Shlomo Zilberstein
"... POMDPs and their decentralized multiagent counterparts, DEC-POMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements ..."
Abstract - Cited by 13 (7 self) - Add to MetaCart
POMDPs and their decentralized multiagent counterparts, DEC-POMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements of current algorithms is based on representing agent policies as finite-state controllers. In this paper, we propose a new approach that uses this representation and formulates the problem as a nonlinear program (NLP). The NLP defines an optimal policy of a desired size for each agent. This new representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DEC-POMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an off-the-shelf optimization method are competitive with stateof-the-art POMDP algorithms and outperform state-of-the-art DEC-POMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DEC-POMDPs using nonlinear programming methods. 1.

AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs

by Stéphane Ross, Brahim Chaib-draa - In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI , 2007
"... Solving large Partially Observable Markov Decision Processes (POMDPs) is a complex task which is often intractable. A lot of effort has been made to develop approximate offline algorithms to solve ever larger POMDPs. However, even stateof-the-art approaches fail to solve large POMDPs in reasonable t ..."
Abstract - Cited by 12 (9 self) - Add to MetaCart
Solving large Partially Observable Markov Decision Processes (POMDPs) is a complex task which is often intractable. A lot of effort has been made to develop approximate offline algorithms to solve ever larger POMDPs. However, even stateof-the-art approaches fail to solve large POMDPs in reasonable time. Recent developments in online POMDP search suggest that combining offline computations with online computations is often more efficient and can also considerably reduce the error made by approximate policies computed offline. In the same vein, we propose a new anytime online search algorithm which seeks to minimize, as efficiently as possible, the error made by an approximate value function computed offline. In addition, we show how previous online computations can be reused in following time steps in order to prevent redundant computations. Our preliminary results indicate that our approach is able to tackle large state space and observation space efficiently and under real-time constraints. 1

POMDP planning for robust robot control

by Joelle Pineau, Geoff Gordon - in: The Twelveth International Symposium on Robotics Research , 2005
"... POMDPs provide a rich framework for planning and control in partially observable domains. Recent new algorithms have greatly improved the scalability of POMDPs, to the point where they can be used in robot applications. In this paper, we describe how approximate POMDP solving can be further improved ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
POMDPs provide a rich framework for planning and control in partially observable domains. Recent new algorithms have greatly improved the scalability of POMDPs, to the point where they can be used in robot applications. In this paper, we describe how approximate POMDP solving can be further improved by the use of a new theoretically-motivated algorithm for selecting salient information states. We present the algorithm, called PEMA, demonstrate competitive performance on a range of navigation tasks, and show how this approach is robust to mismatches between the robot’s physical environment and the model used for planning. 1

Point-based policy iteration

by Shihao Ji, Ronald Parr, Hui Li, Xuejun Liao, Lawrence Carin - In Proceedings of the Twenty-Second National Conference on Artificial Intelligence , 2007
"... We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point-based value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonic: At each iteration before convergen ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point-based value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonic: At each iteration before convergence, PBPI produces a policy for which the values increase for at least one of a finite set of initial belief states, and decrease for none of these states. In contrast, PBVI cannot guarantee monotonic improvement of the value function or the policy. In practice PBPI generally needs a lower density of point coverage in the simplex and tends to produce superior policies with less computation. Experiments on several benchmark problems (up to 12,545 states) demonstrate the scalability and robustness of the PBPI algorithm.

Model-Free Reinforcement Learning as Mixture Learning

by Nikos Vlassis, Marc Toussaint
"... We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is eq ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is equivalent to a non-bootstrapping optimistic policy iteration algorithm like Sarsa(1) that can be applied both in MDPs and POMDPs. On the theoretical side, by relating the proposed stochastic EM algorithm to the family of optimistic policy iteration algorithms, we provide new tools that permit the design and analysis of algorithms in that family. On the practical side, preliminary experiments on a POMDP problem demonstrated encouraging results. 1.

Real-Time Decision Making for Large POMDPs

by Sébastien Paquet, Ludovic Tobin, Brahim Chaib-draa - In 18th Canadian Conference on Artificial Intelligence , 2005
"... Abstract. In this paper, we introduce an approach called RTBSS (Real-Time Belief Space Search) for real-time decision making in large POMDPs. The approach is based on a look-ahead search that is applied online each time the agent has to make a decision. RTBSS is particularly interesting for large re ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Abstract. In this paper, we introduce an approach called RTBSS (Real-Time Belief Space Search) for real-time decision making in large POMDPs. The approach is based on a look-ahead search that is applied online each time the agent has to make a decision. RTBSS is particularly interesting for large real-time environments where offline solutions are not applicable because of their complexity. 1

Automated hierarchy discovery for planning in partially observable domains

by Laurent Charlin - Advances in Neural Information Processing Systems 19 , 2006
"... author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public.

Hybrid POMDP algorithms

by Sébastien Paquet, Brahim Chaib-draa, Stéphane Ross - In Proceedings of The Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM-2006 , 2006
"... When an agent evolves in a partially observable environment, it has to deal with uncertainties when choosing its actions. An efficient model for such environments is to use partially observable Markov decision processes (POMDPs). Many algorithms have been developed for POMDPs. Some use an offline ap ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
When an agent evolves in a partially observable environment, it has to deal with uncertainties when choosing its actions. An efficient model for such environments is to use partially observable Markov decision processes (POMDPs). Many algorithms have been developed for POMDPs. Some use an offline approach, learning a complete policy before the execution. Others use an online approach, constructing the policy online for the current belief state. In this article, we present three hybrid algorithms that have been developed to combine the strengths of these two extremes approaches (offline and online). We present results showing that hybrid algorithms can often obtained better results than the online or the offline algorithms alone. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University