Results 11  20
of
82
VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
"... Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that miti ..."
Abstract

Cited by 53 (5 self)
 Add to MetaCart
Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the Value Directed Compression (VDC) technique [13] with Bounded Policy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states.
Solving pomdps with continuous or large discrete observation spaces
 In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI
, 2005
"... We describe methods to solve partially observable Markov decision processes (POMDPs) with continuous or large discrete observation spaces. Realistic problems often have rich observation spaces, posing significant problems for standard POMDP algorithms that require explicit enumeration of the observa ..."
Abstract

Cited by 50 (10 self)
 Add to MetaCart
(Show Context)
We describe methods to solve partially observable Markov decision processes (POMDPs) with continuous or large discrete observation spaces. Realistic problems often have rich observation spaces, posing significant problems for standard POMDP algorithms that require explicit enumeration of the observations. This problem is usually approached by imposing an a priori discretisation on the observation space, which can be suboptimal for the decision making task. However, since only those observations that would change the policy need to be distinguished, the decision problem itself induces a lossless partitioning of the observation space. This paper demonstrates how to find this partition while computing a policy, and how the resulting discretisation of the observation space reveals the relevant features of the application domain. The algorithms are demonstrated on a toy example and on a realistic assisted living task. 1
A pointbased POMDP algorithm for robot planning
, 2004
"... We present an approximate POMDP solution method for robot planning in partially observable environments. Our algorithm belongs to the family of pointbased value iteration solution techniques for POMDPs, in which planning is performed only on a sampled set of reachable belief points. We describe a s ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
We present an approximate POMDP solution method for robot planning in partially observable environments. Our algorithm belongs to the family of pointbased value iteration solution techniques for POMDPs, in which planning is performed only on a sampled set of reachable belief points. We describe a simple, randomized procedure that performs value update steps that strictly improve the value of all belief points in each step. We demonstrate our algorithm on a robotic delivery task in an office environment and on several benchmark problems, for which we compute solutions that are very competitive to those of stateofthe art methods in terms of speed and solution quality.
Realtime hierarchical POMDPs for autonomous robot navigation
, 2007
"... This paper proposes a new hierarchical formulation of POMDPs for autonomous robot navigation that can be solved in realtime, and is memory efficient. It will be referred to in this paper as the Robot Navigation–Hierarchical POMDP (RNHPOMDP). The RNHPOMDP is utilized as a unified framework for aut ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
(Show Context)
This paper proposes a new hierarchical formulation of POMDPs for autonomous robot navigation that can be solved in realtime, and is memory efficient. It will be referred to in this paper as the Robot Navigation–Hierarchical POMDP (RNHPOMDP). The RNHPOMDP is utilized as a unified framework for autonomous robot navigation in dynamic environments. As such, it is used for localization, planning and local obstacle avoidance. Hence, the RNHPOMDP decides at each time step the actions the robot should execute, without the intervention of any other external module for obstacle avoidance or localization. Our approach employs state space and action space hierarchy, and can effectively model large environments at a fine resolution. Finally, the notion of the reference POMDP is introduced. The latter holds all the information regarding motion and sensor uncertainty, which makes the proposed hierarchical structure memory efficient and enables fast learning. The RNHPOMDP has been experimentally validated in real dynamic environments.
Reinforcement learning for sensing strategies
 in Proceedings of the International Confrerence on Intelligent Robots and Systems (IROS
, 2004
"... Abstract — Mobile robots often have to make decisions on where to point their sensors, which have limited range and coverage. A good sensing strategy allows the robot to collect useful information for its tasks. Most existing solutions to this active sensing problem choose the direction that maximal ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Mobile robots often have to make decisions on where to point their sensors, which have limited range and coverage. A good sensing strategy allows the robot to collect useful information for its tasks. Most existing solutions to this active sensing problem choose the direction that maximally reduces the uncertainty in a single state variable. In more complex problem domains, however, uncertainties exist in multiple state variables, and they affect the performance of the robot in different ways. The robot thus needs to have more sophisticated sensing strategies in order to decide which uncertainties to reduce, and to make the correct tradeoffs. In this work, we apply least squares reinforcement learning methods to solve this problem. We implemented and tested the learning approach in the RoboCup domain, where the robot attempts to reach a ball and accurately kick it into the goal. We present experimental results that suggest our approach is able to learn highly effective sensing strategies. I.
Scaling POMDPs for spoken dialog management
 Audio, Speech, and Language Processing 15(7):2116–2129
"... Abstract — Control in spoken dialog systems is challenging largely because automatic speech recognition is unreliable and hence the state of the conversation can never be known with certainty. Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for pla ..."
Abstract

Cited by 27 (13 self)
 Add to MetaCart
(Show Context)
Abstract — Control in spoken dialog systems is challenging largely because automatic speech recognition is unreliable and hence the state of the conversation can never be known with certainty. Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for planning and control in this context; however, POMDPs face severe scalability challenges and past work has been limited to trivially small dialog tasks. This paper presents a novel POMDP optimization technique – composite summary pointbased value iteration (CSPBVI) – which enables optimization to be performed on slotfilling POMDPbased dialog managers of a realistic size. Using dialog models trained on data from a tourist information domain, simulation results show that CSPBVI scales effectively, outperforms nonPOMDP baselines, and is robust to estimation errors. Index Terms — Decision theory, dialogue management, partially observable Markov decision process, planning under uncertainty, spoken dialogue system. I.
Exploiting Belief Bounds: Practical POMDPs for Personal Assistant Agents
 In AAMAS
, 2005
"... Agents or agent teams deployed to assist humans often face the challenges of monitoring the state of key processes in their environment (including the state of their human users themselves) and making periodic decisions based on such monitoring. POMDPs appear well suited to enable agents to address ..."
Abstract

Cited by 26 (11 self)
 Add to MetaCart
(Show Context)
Agents or agent teams deployed to assist humans often face the challenges of monitoring the state of key processes in their environment (including the state of their human users themselves) and making periodic decisions based on such monitoring. POMDPs appear well suited to enable agents to address these challenges, given the uncertain environment and cost of actions, but optimal policy generation for POMDPs is computationally expensive. This paper introduces three key techniques to speedup POMDP policy generation that exploit the notion of progress or dynamics in personal assistant domains. Policy computation is restricted to the belief space polytope that remains reachable given the progress structure of a domain. We introduce new algorithms; particularly one based on applying Lagrangian methods to compute a bounded belief space support in polynomial time. Our techniques are complementary to many existing exact and approximate POMDP policy generation algorithms. Indeed, we illustrate this by enhancing two of the fastest existing algorithms for exact POMDP policy generation. The order of magnitude speedups demonstrate the utility of our techniques in facilitating the deployment of POMDPs within agents assisting human users.
Online control policy optimization for minimizing map uncertainty during exploration
 In IEEE International Conference on Robotics and Automation
, 2004
"... Abstract — Tremendous progress has been made recently in simultaneous localization and mapping of unknown environments. Using sensor and odometry data from an exploring mobile robot, it has become much easier to build highquality globally consistent maps of many large, realworld environments. To d ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
(Show Context)
Abstract — Tremendous progress has been made recently in simultaneous localization and mapping of unknown environments. Using sensor and odometry data from an exploring mobile robot, it has become much easier to build highquality globally consistent maps of many large, realworld environments. To date, however, relatively little attention has been paid to the controllers used to build these maps. Existing exploration strategies usually attempt to cover the largest amount of unknown space as quickly as possible. Few strategies exist for building the most reliable map possible, but the particular control strategy can have a substantial impact on the quality of the resulting map. In this paper, we devise a control algorithm for exploring unknown space that explicitly tries to build as large a map as possible while maintaining as accurate a map as possible. We make use of a parameterized class of spiral trajectory policies, choosing a new parameter setting at every time step to maximize the expected reward of the policy. We do this in the context of building a visual map of an unknown environment, and show that our strategy leads to a higher accuracy map faster than other candidate controllers, including any single choice in our policy class. I.
ReTrASE: Integrating Paradigms for Approximate Probabilistic Planning
"... Past approaches for solving MDPs have several weaknesses: 1) Decisiontheoretic computation over the state space can yield optimal results but scales poorly. 2) Valuefunction approximation typically requires humanspecified basis functions and has not been shown successful on nominal (“discrete”) d ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
(Show Context)
Past approaches for solving MDPs have several weaknesses: 1) Decisiontheoretic computation over the state space can yield optimal results but scales poorly. 2) Valuefunction approximation typically requires humanspecified basis functions and has not been shown successful on nominal (“discrete”) domains such as those in the ICAPS planning competitions. 3) Replanning by applying a classical planner to a determinized domain model can generate approximate policies for very large problems but has trouble handling probabilistic subtlety [Little and Thiebaux, 2007]. This paper presents RETRASE, a novel MDP solver, which combines decision theory, function approximation and classical planning in a new way. RETRASE uses classical planning to create basis functions for valuefunction approximation and applies expectedutility analysis to this compact space. Our algorithm is memoryefficient and fast (due to its compact, approximate representation), returns highquality solutions (due to the decisiontheoretic framework) and does not require additional knowledge from domain engineers (since we apply classical planning to automatically construct the basis functions). Experiments demonstrate that RETRASE outperforms winners from the past three probabilisticplanning competitions on many hard problems.
A fast pointbased algorithm for POMDPs
 IN PROC. BELGIANDUTCH CONFERENCE ON MACHINE LEARNING
, 2004
"... We describe a pointbased approximate value iteration algorithm for partially observable Markov decision processes. The algorithm performs value function updates ensuring that in each iteration the new value function is an upper bound to the previous value function, as estimated on a sampled set of ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
(Show Context)
We describe a pointbased approximate value iteration algorithm for partially observable Markov decision processes. The algorithm performs value function updates ensuring that in each iteration the new value function is an upper bound to the previous value function, as estimated on a sampled set of belief points. A randomized beliefpoint selection scheme allows for fast update steps. Results indicate that the proposed algorithm achieves competitive performance, both in terms of solution quality as well as speed.