Results 1 - 10
of
39
Point-based value iteration: An anytime algorithm for POMDPs
, 2003
"... This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning. PBVI approximates an exact value iteration solution by selecting a small set of representative belief points, and planning for those only. By using stochastic trajectories to choose belief points, and by ..."
Abstract
-
Cited by 185 (20 self)
- Add to MetaCart
This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning. PBVI approximates an exact value iteration solution by selecting a small set of representative belief points, and planning for those only. By using stochastic trajectories to choose belief points, and by maintaining only one value hyperplane per point, it is able to successfully solve large problems, including the robotic Tag domain, a POMDP version of the popular game of lasertag.
Perseus: Randomized point-based value iteration for POMDPs
- Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract
-
Cited by 111 (8 self)
- Add to MetaCart
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
Partially observable markov decision processes with continuous observations for dialogue management
- Computer Speech and Language
, 2005
"... This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a t ..."
Abstract
-
Cited by 79 (24 self)
- Add to MetaCart
This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a testbed simulated dialogue management problem, we show how recent optimization techniques are able to find a policy for this continuous POMDP which outperforms a traditional MDP approach. Further, we present a method for automatically improving handcrafted dialogue managers by incorporating POMDP belief state monitoring, including confidence score information. Experiments on the testbed system show significant improvements for several example handcrafted dialogue managers across a range of operating conditions. 1
Point-Based POMDP Algorithms: Improved Analysis And Implementation
"... Existing complexity bounds for point-based POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive ..."
Abstract
-
Cited by 69 (1 self)
- Add to MetaCart
Existing complexity bounds for point-based POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive
Perspectives on standardization in mobile robot programming: The carnegie mellon navigation (CARMEN) toolkit
- In Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS
, 2003
"... Abstract — In this paper we describe our open-source ..."
Abstract
-
Cited by 69 (5 self)
- Add to MetaCart
Abstract — In this paper we describe our open-source
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finite-horizon discrete POMDP is PSPACE-complete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
Visibility-based pursuit-evasion with limited field of view
- International Journal of Robotics Research
, 2004
"... We study a form of the pursuit-evasion problem, in which one or more searchers must move through a given environment so as to guarantee detection of any and all evaders, which can move arbitrarily fast. Our goal is to develop techniques for coordinating teams of robots to execute this task in ap ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
We study a form of the pursuit-evasion problem, in which one or more searchers must move through a given environment so as to guarantee detection of any and all evaders, which can move arbitrarily fast. Our goal is to develop techniques for coordinating teams of robots to execute this task in application domains such as clearing a building, for reasons of security or safety. To this end, we introduce a new class of searcher, the #-searcher, which can be readily instantiated as a physical mobile robot. We present a detailed analysis of the pursuit-evasion problem using #-searchers. We show that computing the minimum number of #-searchers required to search a given environment is NP-hard, and present the first complete search algorithm for a single #-searcher. We show how this algorithm can be extended to handle multiple searchers, and give examples of computed trajectories.
VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
"... Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that miti ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the Value Directed Compression (VDC) technique [13] with Bounded Policy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states.
Anytime point-based approximations for large pomdps
- Journal of Artificial Intelligence Research
, 2006
"... The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known tech ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.
Global a-optimal robot exploration in slam
- In ICRA
, 2005
"... Abstract — It is well-known that the Kalman filter for simultaneous localization and mapping (SLAM) converges to a fully correlated map in the limit of infinite time and data [1]. However, the rate of convergence of the map has a strong dependence on the order of the observations. We show that conve ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
Abstract — It is well-known that the Kalman filter for simultaneous localization and mapping (SLAM) converges to a fully correlated map in the limit of infinite time and data [1]. However, the rate of convergence of the map has a strong dependence on the order of the observations. We show that conventional exploration algorithms for collecting map data are sub-optimal in both the objective function and choice of optimization procedure. We show that optimizing the aoptimal information measure results in a more accurate map than existing approaches, using a greedy, closed-loop strategy. Secondly, we demonstrate that by restricting the planning to an appropriate policy class, we can tractably find non-greedy, global planning trajectories that produce more accurate maps, explicitly planning to close loops even in open-loop scenarios. I.

