Results 1  10
of
43
Towards a Unified Theory of State Abstraction for MDPs
 In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics
, 2006
"... extensively studied in the fields of artificial intelligence and operations research. Instead of working in the ground state space, the decision maker usually finds solutions in the abstract state space much faster by treating groups of states as a unit by ignoring irrelevant state information. A nu ..."
Abstract

Cited by 61 (5 self)
 Add to MetaCart
(Show Context)
extensively studied in the fields of artificial intelligence and operations research. Instead of working in the ground state space, the decision maker usually finds solutions in the abstract state space much faster by treating groups of states as a unit by ignoring irrelevant state information. A number of abstractions have been proposed and studied in the reinforcementlearning and planning literatures, and positive and negative results are known. We provide a unified treatment of state abstraction for Markov decision processes. We study five particular abstraction schemes, some of which have been proposed in the past in di#erent forms, and analyze their usability for planning and learning.
Adaptive MultiRobot WideArea Exploration and Mapping
"... The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multirobot exploration strategy that is novel in performing both widearea coverage and hotspot sam ..."
Abstract

Cited by 34 (22 self)
 Add to MetaCart
(Show Context)
The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multirobot exploration strategy that is novel in performing both widearea coverage and hotspot sampling using nonmyopic path planning. As a result, the environmental phenomena can be accurately mapped. It is based on a dynamic programming formulation, which we call the Multirobot Adaptive Sampling Problem (MASP). A key feature of MASP is in covering the entire adaptivity spectrum, thus allowing strategies of varying adaptivity to be formed and theoretically analyzed in their performance; a more adaptive strategy improves mapping accuracy. We apply MASP to sampling the Gaussian and log
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize longterm utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce shortterm utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better gameplaying strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach nearoptimal strategies
A fast analytical algorithm for solving markov decision processes with realvalued resources
 In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI07
, 2007
"... Agents often have to construct plans that obey deadlines or, more generally, resource limits for realvalued resources whose consumption can only be characterized by probability distributions, such as execution time or battery power. These planning problems can be modeled with continuous state Marko ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
Agents often have to construct plans that obey deadlines or, more generally, resource limits for realvalued resources whose consumption can only be characterized by probability distributions, such as execution time or battery power. These planning problems can be modeled with continuous state Markov decision processes (MDPs) but existing solution methods are either inefficient or provide no guarantee on the quality of the resulting policy. We therefore present CPH, a novel solution method that solves the planning problems by first approximating with any desired accuracy the probability distributions over the resource consumptions with phasetype distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs by exploiting properties of exponential distributions to calculate the necessary convolutions accurately and efficiently while providing strong guarantees on the quality of the resulting policy. Our experimental feasibility study in a Mars rover domain demonstrates a substantial speedup over Lazy Approximation, which is currently the leading algorithm for solving continuous state MDPs with quality guarantees. 1
A Heuristic Search Approach to Planning with Continuous Resources in Stochastic Domains
"... We consider the problem of optimal planning in stochastic domains with resource constraints, where the resources are continuous and the choice of action at each step depends on resource availability. We introduce the HAO * algorithm, a generalization of the AO * algorithm that performs search in a h ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of optimal planning in stochastic domains with resource constraints, where the resources are continuous and the choice of action at each step depends on resource availability. We introduce the HAO * algorithm, a generalization of the AO * algorithm that performs search in a hybrid state space that is modeled using both discrete and continuous state variables, where the continuous variables represent monotonic resources. Like other heuristic search algorithms, HAO * leverages knowledge of the start state and an admissible heuristic to focus computational effort on those parts of the state space that could be reached from the start state by following an optimal policy. We show that this approach is especially effective when resource constraints limit how much of the state space is reachable. Experimental results demonstrate its effectiveness in the domain that motivates our research: automated planning for planetary exploration rovers. 1.
Functional Value Iteration for DecisionTheoretic Planning with General Utility Functions
"... We study how to find plans that maximize the expected total utility for a given MDP, a planning objective that is important for decision making in highstakes domains. The optimal actions can now depend on the total reward that has been accumulated so far in addition to the current state. We extend ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
We study how to find plans that maximize the expected total utility for a given MDP, a planning objective that is important for decision making in highstakes domains. The optimal actions can now depend on the total reward that has been accumulated so far in addition to the current state. We extend our previous work on functional value iteration from oneswitch utility functions to all utility functions that can be approximated with piecewise linear utility functions (with and without exponential tails) by using functional value iteration to find a plan that maximizes the expected total utility for the approximate utility function. Functional value iteration does not maintain a value for every state but a value function that maps the total reward that has been accumulated so far into a value. We describe how functional value iteration represents these value functions in finite form, how it performs dynamic programming by manipulating these representations and what kinds of approximation guarantees it is able to make. We also apply it to a probabilistic blocksworld problem, a standard test domain for decisiontheoretic planners.
Symbolic Dynamic Programming for Continuous State and Action MDPs
"... Many realworld decisiontheoretic planning problems are naturally modeled using both continuous state and action (CSA) spaces, yet little work has provided exact solutions for the case of continuous actions. In this work, we propose a symbolic dynamic programming (SDP) solution to obtain the optima ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
Many realworld decisiontheoretic planning problems are naturally modeled using both continuous state and action (CSA) spaces, yet little work has provided exact solutions for the case of continuous actions. In this work, we propose a symbolic dynamic programming (SDP) solution to obtain the optimal closedform value function and policy for CSAMDPs with multivariate continuous state and actions, discrete noise, piecewise linear dynamics, and piecewise linear (or restricted piecewise quadratic) reward. Our key contribution over previous SDP work is to show how the continuous action maximization step in the dynamic programming backup can be evaluated optimally and symbolically — a task which amounts to symbolic constrained optimization subject to unknown state parameters; we further integrate this technique to work with an efficient and compact data structure for SDP — the extended algebraic decision diagram (XADD). We demonstrate empirical results on a didactic nonlinear planning example and two domains from operations research to show the first automated exact solution to these problems.
Improving Adjustable Autonomy Strategies for TimeCritical Domains
"... As agents begin to perform complex tasks alongside humans as collaborative teammates, it becomes crucial that the resulting humanmultiagent teams adapt to timecritical domains. In such domains, adjustable autonomy has proven useful by allowing for a dynamic transfer of control of decision making be ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
As agents begin to perform complex tasks alongside humans as collaborative teammates, it becomes crucial that the resulting humanmultiagent teams adapt to timecritical domains. In such domains, adjustable autonomy has proven useful by allowing for a dynamic transfer of control of decision making between human and agents. However, existing adjustable autonomy algorithms commonly discretize time, which not only results in high algorithm runtimes but also translates into inaccurate transfer of control policies. In addition, existing techniques fail to address decision making inconsistencies often encountered in human multiagent decision making. To address these limitations, we present novel approach for Resolving Inconsistencies in Adjustable Autonomy in Continuous Time (RIAACT) that makes three contributions: First, we apply continuous time planning paradigm to adjustable autonomy, resulting in highaccuracy transfer of control policies. Second, our new adjustable autonomy framework both models and plans for the resolving of inconsistencies between human and agent decisions. Third, we introduce a new model, Interruptible Action Timedependent Markov Decision Problem (IATMDP), which allows for actions to be interrupted at any point in continuous time. We show how to solve IATMDPs efficiently and leverage them to plan for the resolving of inconsistencies in RIAACT. Furthermore, these contributions have been realized and evaluated in a complex disaster response simulation system. 1.
Symbolic dynamic programming for discrete and continuous state mdps
 In UAI2011
, 2011
"... Many realworld decisiontheoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DCMDPs). While previous work has addressed automated decisiontheoretic planning for DCMDPs, optimal solutions have only been defined so far for limited setti ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
Many realworld decisiontheoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DCMDPs). While previous work has addressed automated decisiontheoretic planning for DCMDPs, optimal solutions have only been defined so far for limited settings, e.g., DCMDPs having hyperrectangular piecewise linear value functions. In this work, we extend symbolic dynamic programming (SDP) techniques to provide optimal solutions for a vastly expanded class of DCMDPs. To address the inherent combinatorial aspects of SDP, we introduce the XADD — a continuous variable extension of the algebraic decision diagram (ADD) — that maintains compact representations of the exact value function. Empirically, we demonstrate an implementation of SDP with XADDs on various DCMDPs, showing the first optimal automated solutions to DCMDPs with linear and nonlinear piecewise partitioned value functions and showing the advantages of constraintbased pruning for XADDs. 1
Towards faster planning with continuous resources in stochastic domains
 In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI
, 2008
"... Agents often have to construct plans that obey resource limits for continuous resources whose consumption can only be characterized by probability distributions. While Markov Decision Processes (MDPs) with a state space of continuous and discrete variables are popular for modeling these domains, c ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Agents often have to construct plans that obey resource limits for continuous resources whose consumption can only be characterized by probability distributions. While Markov Decision Processes (MDPs) with a state space of continuous and discrete variables are popular for modeling these domains, current algorithms for such MDPs can exhibit poor performance with a scaleup in their state space. To remedy that we propose an algorithm called DPFP. DPFP’s key contribution is its exploitation of the dual space cumulative distribution functions. This dual formulation is key to DPFP’s novel combination of three features. First, it enables DPFP’s membership in a class of algorithms that perform forward search in a large (possibly infinite) policy space. Second, it provides a new and efficient approach for varying the policy generation effort based on the likelihood of reaching different regions of the MDP state space. Third, it yields a bound on the error produced by such approximations. These three features conspire to allow DPFP’s superior performance and systematic tradeoff of optimality for speed. Our experimental evaluation shows that, when run standalone, DPFP outperforms other algorithms in terms of its anytime performance, whereas when run as a hybrid, it allows for a significant speedup of a leading continuous resource MDP solver.