Results 1  10
of
34
Partially observable markov decision processes with continuous observations for dialogue management
 Computer Speech and Language
, 2005
"... This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a t ..."
Abstract

Cited by 217 (52 self)
 Add to MetaCart
(Show Context)
This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a testbed simulated dialogue management problem, we show how recent optimization techniques are able to find a policy for this continuous POMDP which outperforms a traditional MDP approach. Further, we present a method for automatically improving handcrafted dialogue managers by incorporating POMDP belief state monitoring, including confidence score information. Experiments on the testbed system show significant improvements for several example handcrafted dialogue managers across a range of operating conditions. 1
Valuedirected Compression of POMDPs
 In NIPS 15
, 2002
"... We examine the problem of generating statespace compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not af ..."
Abstract

Cited by 72 (4 self)
 Add to MetaCart
(Show Context)
We examine the problem of generating statespace compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality.
Planning and acting in uncertain environments using probabilistic inference
 In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems
, 2006
"... Abstract — An important problem in robotics is planning and selecting actions for goaldirected behavior in noisy uncertain environments. The problem is typically addressed within the framework of partially observable Markov decision processes (POMDPs). Although efficient algorithms exist for learni ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract — An important problem in robotics is planning and selecting actions for goaldirected behavior in noisy uncertain environments. The problem is typically addressed within the framework of partially observable Markov decision processes (POMDPs). Although efficient algorithms exist for learning policies for MDPs, these algorithms do not generalize easily to POMDPs. In this paper, we propose a framework for planning and action selection based on probabilistic inference in graphical models. Unlike previous approaches based on MAP inference, our approach utilizes the most probable explanation (MPE) of variables in a graphical model, allowing tractable and efficient inference of actions. It generalizes easily to complex partially observable environments. Furthermore, it allows rewards and costs to be incorporated in a straightforward manner as part of the inference process. We investigate the application of our approach to the problem of robot navigation by testing it on a suite of wellknown POMDP benchmarks. Our results demonstrate that the proposed method can beat or match the performance of recently proposed specialized POMDP solvers. I.
Transition Entropy in Partially Observable Markov Decision Processes
 Proceedings of the 9th International Conference on Intelligent Autonomous Systems (IAS9)
, 2006
"... This paper proposes a new heuristic algorithm suitable for realtime applications using partially observable Markov decision processes (POMDP). The algorithm is based in a reward shaping strategy which includes entropy information in the reward structure of a fully observable Markov decision process ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
This paper proposes a new heuristic algorithm suitable for realtime applications using partially observable Markov decision processes (POMDP). The algorithm is based in a reward shaping strategy which includes entropy information in the reward structure of a fully observable Markov decision process (MDP). This strategy, as illustrated by the presented results, exhibits nearoptimal performance in all examples tested.
A survey of modelbased and modelfree methods for resolving perceptual aliasing
 Department of Computer
, 2004
"... We focus our attention on agents learning to act in an unknown domain using noisy sensors. Such domains may be modeled by a Partially Observable Markov Decision Process (POMDP) that can be solved optimally. However, when the model of the environment is unknown, most research in the area studies mode ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
We focus our attention on agents learning to act in an unknown domain using noisy sensors. Such domains may be modeled by a Partially Observable Markov Decision Process (POMDP) that can be solved optimally. However, when the model of the environment is unknown, most research in the area studies modelfree methods — methods that learn to act without learning a model. When the agents ’ sensors provide deterministic output, modelfree methods produce close to optimal results. However, as sensor noise increases, the accuracy of such methods decreases. Another, less explored, option is the modelbased approach — learning a POMDP model of the world, and computing an optimal solution using the learned model. In this survey we explore modelbased and modelfree techniques for handling perceptual aliasing. 2
Scan strategies for adaptive meteorological radars
 in Advances in Neural Information Processing Systems 21
, 2007
"... We address the problem of adaptive sensor control in dynamic resourceconstrained sensor networks. We focus on a meteorological sensing network comprising radars that can perform sector scanning rather than always scanning 360 ◦. We compare three sector scanning strategies. The sitandspin strategy ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
We address the problem of adaptive sensor control in dynamic resourceconstrained sensor networks. We focus on a meteorological sensing network comprising radars that can perform sector scanning rather than always scanning 360 ◦. We compare three sector scanning strategies. The sitandspin strategy always scans 360 ◦. The limited lookahead strategy additionally uses the expected environmental state K decision epochs in the future, as predicted from Kalman filters, in its decisionmaking. The full lookahead strategy uses all expected future states by casting the problem as a Markov decision process and using reinforcement learning to estimate the optimal scan strategy. We show that the main benefits of using a lookahead strategy are when there are multiple meteorological phenomena in the environment, and when the maximum radius of any phenomenon is sufficiently smaller than the radius of the radars. We also show that there is a tradeoff between the average quality with which a phenomenon is scanned and the number of decision epochs before which a phenomenon is rescanned. 1
Health aware stochastic planning for persistent package delivery missions using quadrotors
 In International Conference on Intelligent Robots and Systems (IROS
, 2014
"... In persistent missions, taking system’s health and capability degradation into account is an essential factor to predict and avoid failures. The state space in healthaware planning problems is often a mixture of continuous vehiclelevel and discrete missionlevel states. This in particular poses a ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
In persistent missions, taking system’s health and capability degradation into account is an essential factor to predict and avoid failures. The state space in healthaware planning problems is often a mixture of continuous vehiclelevel and discrete missionlevel states. This in particular poses a challenge when the mission domain is partially observable and restricts the use of computationally expensive forward search methods. This paper presents a method that exploits a structure that exists in many healthaware planning problems and perform a twolayer planning scheme. The lower layer exploits the local linearization and Gaussian distribution assumption over vehicle level states while the higher level maintains a nonGaussian distribution over discrete missionlevel variables. This twolayer planning scheme allows us to limit the expensive online forward search to the missionlevel states, and thus predict system’s behavior over longer horizons in the future. We demonstrate the performance of the method on a long duration package delivery mission using a quadrotor in a partiallyobservable domain in the presence of constraints and health/capability degradation. 1
Linear Dynamic Programming and the Training of Sequence Estimators
"... Abstract We consider the problem of nding an optimal path through a trellis graph when the arc costs are linear functions of an unknown parameter vector. In this context we develop an algorithm, Linear Dynamic Programming (LDP), that simultaneously computes the optimal path for all values of the par ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract We consider the problem of nding an optimal path through a trellis graph when the arc costs are linear functions of an unknown parameter vector. In this context we develop an algorithm, Linear Dynamic Programming (LDP), that simultaneously computes the optimal path for all values of the parameter. We show how the LDP algorithm can be used for supervised learning of the arc costs for a dynamicprogrammingbased sequence estimator by minimizing empirical risk. We present an application to musical harmonic analysis in which we optimize the performance of our estimator by seeking the parameter value generating the sequence best agreeing with handlabeled data. 1
Dynamics Based Control with PSRs
, 2008
"... We present an extension of the Dynamics Based Control (DBC) paradigm to environment models based on Predictive State Representations (PSRs). We show an approximate greedy version of the DBC for PSR model, EMTPSR, and demonstrate how this algorithm can be applied to solve several control problems. W ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We present an extension of the Dynamics Based Control (DBC) paradigm to environment models based on Predictive State Representations (PSRs). We show an approximate greedy version of the DBC for PSR model, EMTPSR, and demonstrate how this algorithm can be applied to solve several control problems. We then provide some classifications and requirements of PSR environment models that are necessary for the EMTPSR algorithm to operate.