Results 1  10
of
344
Partially observable markov decision processes with continuous observations for dialogue management
 Computer Speech and Language
, 2005
"... This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a t ..."
Abstract

Cited by 210 (50 self)
 Add to MetaCart
(Show Context)
This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a testbed simulated dialogue management problem, we show how recent optimization techniques are able to find a policy for this continuous POMDP which outperforms a traditional MDP approach. Further, we present a method for automatically improving handcrafted dialogue managers by incorporating POMDP belief state monitoring, including confidence score information. Experiments on the testbed system show significant improvements for several example handcrafted dialogue managers across a range of operating conditions. 1
Perseus: Randomized pointbased value iteration for POMDPs
 Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract

Cited by 202 (16 self)
 Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized pointbased value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other pointbased methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
SARSOP: Efficient PointBased POMDP Planning by Approximating Optimally Reachable Belief Spaces
"... Abstract — Motion planning in uncertain and dynamic environments is an essential capability for autonomous robots. Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for solving such problems, but they are often avoided in robotics due to high computa ..."
Abstract

Cited by 191 (16 self)
 Add to MetaCart
(Show Context)
Abstract — Motion planning in uncertain and dynamic environments is an essential capability for autonomous robots. Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for solving such problems, but they are often avoided in robotics due to high computational complexity. Our goal is to create practical POMDP algorithms and software for common robotic tasks. To this end, we have developed a new pointbased POMDP algorithm that exploits the notion of optimally reachable belief spaces to improve computational efficiency. In simulation, we successfully applied the algorithm to a set of common robotic tasks, including instances of coastal navigation, grasping, mobile robot exploration, and target tracking, all modeled as POMDPs with a large number of states. In most of the instances studied, our algorithm substantially outperformed one of the fastest existing pointbased algorithms. A software package implementing our algorithm is available for download at
Pointbased POMDP algorithms: Improved analysis and implementation
 in Proceedings of Uncertainty in Artificial Intelligence
"... Existing complexity bounds for pointbased POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive a new bound that relies on both and uses the concept of discounted reachability; our conclusions may help guide future algorithm design. We also ..."
Abstract

Cited by 154 (3 self)
 Add to MetaCart
Existing complexity bounds for pointbased POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive a new bound that relies on both and uses the concept of discounted reachability; our conclusions may help guide future algorithm design. We also discuss recent improvements to our (pointbased) heuristic search value iteration algorithm. Our new implementation calculates tighter initial bounds, avoids solving linear programs, and makes more effective use of sparsity. Empirical results show speedups of more than two orders of magnitude. 1
Heuristic search value iteration for pomdps
 In UAI
, 2004
"... We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two wellknown techniques: attentionfocusing search ..."
Abstract

Cited by 139 (4 self)
 Add to MetaCart
We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two wellknown techniques: attentionfocusing search heuristics and piecewise linear convex representations of the value function. HSVI’s soundness and convergence have been proven. On some benchmark problems from the literature, HSVI displays speedups of greater than 100 with respect to other stateoftheart POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature. 1
Online planning algorithms for POMDPs
 Journal of Artificial Intelligence Research
, 2008
"... Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decisionmaking under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate ..."
Abstract

Cited by 109 (3 self)
 Add to MetaCart
(Show Context)
Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decisionmaking under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that stateoftheart online heuristic search methods can handle large POMDP domains efficiently. 1.
Information gainbased exploration using RaoBlackwellized particle filters
 In RSS
, 2005
"... Abstract — This paper presents an integrated approach to exploration, mapping, and localization. Our algorithm uses a highly efficient RaoBlackwellized particle filter to represent the posterior about maps and poses. It applies a decisiontheoretic framework which simultaneously considers the uncer ..."
Abstract

Cited by 96 (4 self)
 Add to MetaCart
Abstract — This paper presents an integrated approach to exploration, mapping, and localization. Our algorithm uses a highly efficient RaoBlackwellized particle filter to represent the posterior about maps and poses. It applies a decisiontheoretic framework which simultaneously considers the uncertainty in the map and in the pose of the vehicle to evaluate potential actions. Thereby, it trades off the cost of executing an action with the expected information gain and takes into account possible sensor measurements gathered along the path taken by the robot. We furthermore describe how to utilize the properties of the RaoBlackwellization to efficiently compute the expected information gain. We present experimental results obtained in the real world and in simulation to demonstrate the effectiveness of our approach. I.
Improved memorybounded dynamic programming for decentralized POMDPs
 In Proceedings of the TwentyThird Conference on Uncertainty in Artificial Intelligence
, 2007
"... Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memorybounded dynamic program ..."
Abstract

Cited by 94 (22 self)
 Add to MetaCart
Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memorybounded dynamic programming algorithm for finitehorizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decisionmaking techniques. 1
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs
 In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems
, 2004
"... Partially observable decentralized decision making in robot teams is fundamentally different from decision making in fully observable problems. Team members cannot simply apply singleagent solution techniques in parallel. Instead, we must turn to game theoretic frameworks to correctly model the pro ..."
Abstract

Cited by 92 (2 self)
 Add to MetaCart
Partially observable decentralized decision making in robot teams is fundamentally different from decision making in fully observable problems. Team members cannot simply apply singleagent solution techniques in parallel. Instead, we must turn to game theoretic frameworks to correctly model the problem. While partially observable stochastic games (POSGs) provide a solution model for decentralized robot teams, this model quickly becomes intractable. We propose an algorithm that approximates POSGs as a series of smaller, related Bayesian games, using heuristics such as QMDP to provide the future discounted value of actions. This algorithm trades off limited lookahead in uncertainty for computational feasibility, and results in policies that are locally optimal with respect to the selected heuristic. Empirical results are provided for both a simple problem for which the full POSG can also be constructed, as well as more complex, robotinspired, problems.
Bounded finite state controllers
 In NIPS
, 2004
"... We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of boundedsize, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller sp ..."
Abstract

Cited by 92 (12 self)
 Add to MetaCart
We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of boundedsize, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller space) and policy iteration (less vulnerability to local optima). 1