Results 1  10
of
84
Perseus: Randomized pointbased value iteration for POMDPs
 Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract

Cited by 202 (16 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized pointbased value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other pointbased methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
PointBased Value Iteration for Continuous POMDPs
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for modelbased POMDPs are restricted to discrete states, actions, and observations, but many realworld problems such as, for instance, robot na ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for modelbased POMDPs are restricted to discrete states, actions, and observations, but many realworld problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewiselinear convex for the particular case of discrete observations and actions but still continuous states. We also demonstrate that continuous Bellman backups are contracting and isotonic ensuring the monotonic convergence of valueiteration algorithms. Relying on those properties, we extend the PERSEUS algorithm, originally developed for discrete POMDPs, to work in continuous state spaces by representing the observation, transition, and reward models using Gaussian mixtures, and the beliefs using Gaussian mixtures or particle sets. With these representations, the integrals that appear in the Bellman backup can be computed in closed form and, therefore, the algorithm is computationally feasible. Finally, we further extend PERSEUS to deal with continuous action and observation sets by designing effective sampling approaches.
Planning under Uncertainty for Robotic Tasks with Mixed Observability
"... Partially observable Markov decision processes (POMDPs) provide a principled, general framework for robot motion planning in uncertain and dynamic environments. They have been applied to various robotic tasks. However, solving POMDPs exactly is computationally intractable. A major challenge is to sc ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) provide a principled, general framework for robot motion planning in uncertain and dynamic environments. They have been applied to various robotic tasks. However, solving POMDPs exactly is computationally intractable. A major challenge is to scale up POMDP algorithms for complex robotic tasks. Robotic systems often have mixed observability: even when a robot’s state is not fully observable, some components of the state may still be so. We use a factored model to represent separately the fully and partially observable components of a robot’s state and derive a compact lowerdimensional representation of its belief space. This factored representation can be combined with any pointbased algorithm to compute approximate POMDP solutions. Experimental results show that on standard test problems, our approach improves the performance of a leading pointbased POMDP algorithm by many times. 1
Motion Planning under Uncertainty for Robotic Tasks with Long Time Horizons
"... Abstract Partially observable Markov decision processes (POMDPs) are a principled mathematical framework for planning under uncertainty, a crucial capability for reliable operation of autonomous robots. By using probabilistic sampling, pointbased POMDP solvers have drastically improved the speed of ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
(Show Context)
Abstract Partially observable Markov decision processes (POMDPs) are a principled mathematical framework for planning under uncertainty, a crucial capability for reliable operation of autonomous robots. By using probabilistic sampling, pointbased POMDP solvers have drastically improved the speed of POMDP planning, enabling POMDPs to handle moderately complex robotic tasks. However, robot motion planning tasks with long time horizons remain a severe obstacle for even the fastest pointbased POMDP solvers today. This paper proposes Milestone Guided Sampling (MiGS), a new pointbased POMDP solver, which exploits state space information to reduce the effective planning horizon. MiGS samples a set of points, called milestones, from a robot’s state space, uses them to construct a compact, sampled representation of the state space, and then uses this representation of the state space to guide sampling in the belief space. This strategy reduces the effective planning horizon, while still capturing the essential features of the belief space with a small number of sampled points. Preliminary results are very promising. We tested MiGS in simulation on several difficult POMDPs modeling distinct robotic tasks with long time horizons; they are impossible with the fastest pointbased POMDP solvers today. MiGS solved them in a few minutes. 1
Bounded realtime dynamic programming: RTDP with monotone upper bounds and performance guarantees
 In ICML’05
, 2005
"... MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start state to a goal, restricting computation to states relevant to this task can make much larger problems tractable. We intro ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
(Show Context)
MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start state to a goal, restricting computation to states relevant to this task can make much larger problems tractable. We introduce a new algorithm, Bounded RTDP, which can produce partial policies with strong performance guarantees while only touching a fraction of the state space, even on problems where other algorithms would have to visit the full state space. To do so, Bounded RTDP maintains both upper and lower bounds on the optimal value function. The performance of Bounded RTDP is greatly aided by the introduction of a new technique to efficiently find suitable upper bounds; this technique can also be used to provide informed initialization to a wide range of other planning algorithms. 1.
Efficient multirobot search for a moving target
 Int. J. Robotics Research
, 2009
"... This paper examines the problem of locating a mobile, nonadversarial target in an indoor environment using multiple robotic searchers. One way to formulate this problem is to assume a known environment and choose searcher paths most likely to intersect with the path taken by the target. We refer to ..."
Abstract

Cited by 29 (15 self)
 Add to MetaCart
(Show Context)
This paper examines the problem of locating a mobile, nonadversarial target in an indoor environment using multiple robotic searchers. One way to formulate this problem is to assume a known environment and choose searcher paths most likely to intersect with the path taken by the target. We refer to this as the multirobot efficient search path planning (MESPP) problem. Such path planning problems are NPhard, and optimal solutions typically scale exponentially in the number of searchers. We present an approximation algorithm that utilizes finitehorizon planning and implicit coordination to achieve linear scalability in the number of searchers. We prove that solving the MESPP problem requires maximizing a nondecreasing, submodular objective function, which leads to theoretical bounds on the performance of our approximation algorithm. We extend our analysis by considering the scenario where searchers are given noisy nonlineofsight ranging measurements to the target. For this scenario, we derive and integrate online Bayesian measurement updating into our framework. We demonstrate the performance of our framework in two largescale simulated environments, and we further validate our results using data from a novel ultrawideband ranging sensor. Finally, we provide an analysis that demonstrates the rela
Search and pursuitevasion in mobile robotics
 Autonomous Robots
"... (will be inserted by the editor) ..."
Proofs and Experiments in Scalable, NearOptimal Search by Multiple Robots
"... Abstract — In this paper, we examine the problem of locating a nonadversarial target using multiple robotic searchers. This problem is relevant to many applications in robotics including emergency response and aerial surveillance. Assuming a known environment, this problem becomes one of choosing s ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
Abstract — In this paper, we examine the problem of locating a nonadversarial target using multiple robotic searchers. This problem is relevant to many applications in robotics including emergency response and aerial surveillance. Assuming a known environment, this problem becomes one of choosing searcher paths that are most likely to intersect with the path taken by the target. We refer to this as the Multirobot Efficient Search Path Planning (MESPP) problem. Such path planning problems are NPhard, and optimal solutions typically scale exponentially in the number of searchers. We present a finitehorizon path enumeration algorithm for solving the MESPP problem that utilizes sequential allocation to achieve linear scalability in the number of searchers. We show that solving the MESPP problem requires the maximization of a nondecreasing, submodular objective function, which directly leads to theoretical guarantees on paths generated by sequential allocation. We also demonstrate how our algorithm can run online to incorporate noisy measurements of the target’s position during search. We verify the performance of our algorithm both in simulation and in experiments with a novel radio sensor capable of providing range through walls. Our results show that our linearly scalable MESPP algorithm generates searcher paths competitive with those generated by exponential algorithms. I.
Robot planning in partially observable continuous domains
 In Robotics: Science and Systems I
, 2005
"... Abstract — We present a value iteration algorithm for learning to act in Partially Observable Markov Decision Processes (POMDPs) with continuous state spaces. Mainstream POMDP research focuses on the discrete case and this complicates its application to, e.g., robotic problems that are naturally mod ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
Abstract — We present a value iteration algorithm for learning to act in Partially Observable Markov Decision Processes (POMDPs) with continuous state spaces. Mainstream POMDP research focuses on the discrete case and this complicates its application to, e.g., robotic problems that are naturally modeled using continuous state spaces. The main difficulty in defining a (beliefbased) POMDP in a continuous state space is that expected values over states must be defined using integrals that, in general, cannot be computed in closed from. In this paper, we first show that the optimal finitehorizon value function over the continuous infinitedimensional POMDP belief space is piecewise linear and convex, and is defined by a finite set of supporting αfunctions that are analogous to the αvectors (hyperplanes) defining the value function of a discretestate POMDP. Second, we show that, for a fairly general class of POMDP models in which all functions of interest are modeled by Gaussian mixtures, all belief updates and value iteration backups can be carried out analytically and exact. A crucial difference with respect to the αvectors of the discrete case is that, in the continuous case, the αfunctions will typically grow in complexity (e.g., in the number of components) in each value iteration. Finally, we demonstrate PERSEUS, our previously proposed randomized pointbased value iteration algorithm, in a simple robot planning problem with a continuous domain, where encouraging results are observed. I.
MULTIARMED BANDIT PROBLEMS
"... Multiarmed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Multiarmed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield