Results 1  10
of
47
Approximate Policy Iteration with a Policy Language Bias
 Journal of Artificial Intelligence Research
, 2003
"... We explore approximate policy iteration (API), replacing the usual costfunction learning step with a learning step in policy space. We give policylanguage biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. ..."
Abstract

Cited by 140 (18 self)
 Add to MetaCart
We explore approximate policy iteration (API), replacing the usual costfunction learning step with a learning step in policy space. We give policylanguage biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve.
Approximate linear programming for firstorder MDPs
 In Proc. UAI05, 509– 517
, 2005
"... We introduce a new approximate solution technique for firstorder Markov decision processes (FOMDPs). Representing the value function linearly w.r.t. a set of firstorder basis functions, we compute suitable weights by casting the corresponding optimization as a firstorder linear program and show h ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
(Show Context)
We introduce a new approximate solution technique for firstorder Markov decision processes (FOMDPs). Representing the value function linearly w.r.t. a set of firstorder basis functions, we compute suitable weights by casting the corresponding optimization as a firstorder linear program and show how offtheshelf theorem prover and LP software can be effectively used. This technique allows one to solve FOMDPs independent of a specific domain instantiation; furthermore, it allows one to determine bounds on approximation error that apply equally to all domain instantiations. We apply this solution technique to the task of elevator scheduling with a rich feature space and multicriteria additive reward, and demonstrate that it outperforms a number of intuitive, heuristicallyguided policies. 1
First order decision diagrams for relational MDPs
 In Proceedings of the International Joint Conference of Artificial Intelligence
, 2007
"... Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational struc ..."
Abstract

Cited by 26 (9 self)
 Add to MetaCart
(Show Context)
Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy. 1.
Practical solution techniques for firstorder mdps
 Artificial Intelligence
"... Many traditional solution approaches to relationally specified decisiontheoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approa ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Many traditional solution approaches to relationally specified decisiontheoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approach directly to the resulting ground Markov decision process (MDP). Unfortunately, the space and time complexity of these grounded solution approaches are polynomial in the number of domain objects and exponential in the predicate arity and the number of nested quantifiers in the relational problem specification. An alternative to grounding a relational planning problem is to tackle the problem directly at the relational level. In this article, we propose one such approach that translates an expressive subset of the PPDDL representation to a firstorder MDP (FOMDP) specification and then derives a domainindependent policy without grounding at any intermediate step. However, such generality does not come without its own set of challenges—the purpose of this article is to explore practical solution techniques for solving FOMDPs. To demonstrate the applicability of our techniques, we present proofofconcept results of our firstorder approximate linear programming (FOALP) planner on problems from the probabilistic track
FLUCAP: a heuristic search planner for firstorder MDPs
 Journal of Artificial Intelligence Research (JAIR
, 2006
"... ar ..."
ReTrASE: Integrating Paradigms for Approximate Probabilistic Planning
"... Past approaches for solving MDPs have several weaknesses: 1) Decisiontheoretic computation over the state space can yield optimal results but scales poorly. 2) Valuefunction approximation typically requires humanspecified basis functions and has not been shown successful on nominal (“discrete”) d ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
(Show Context)
Past approaches for solving MDPs have several weaknesses: 1) Decisiontheoretic computation over the state space can yield optimal results but scales poorly. 2) Valuefunction approximation typically requires humanspecified basis functions and has not been shown successful on nominal (“discrete”) domains such as those in the ICAPS planning competitions. 3) Replanning by applying a classical planner to a determinized domain model can generate approximate policies for very large problems but has trouble handling probabilistic subtlety [Little and Thiebaux, 2007]. This paper presents RETRASE, a novel MDP solver, which combines decision theory, function approximation and classical planning in a new way. RETRASE uses classical planning to create basis functions for valuefunction approximation and applies expectedutility analysis to this compact space. Our algorithm is memoryefficient and fast (due to its compact, approximate representation), returns highquality solutions (due to the decisiontheoretic framework) and does not require additional knowledge from domain engineers (since we apply classical planning to automatically construct the basis functions). Experiments demonstrate that RETRASE outperforms winners from the past three probabilisticplanning competitions on many hard problems.
Imitation Learning in Relational Domains: A FunctionalGradient Boosting Approach
"... Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitatio ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitation learning as supervised learning of a function from states to actions. For propositional worlds, functional gradient methods have been proved to be beneficial. They are simpler to implement than most existing methods, more efficient, more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the form of the function. Building on recent generalizations of functional gradient boosting to relational representations, we implement a functional gradient boosting approach to imitation learning in relational domains. In particular, given a set of traces from the human teacher, our system learns a policy in the form of a set of relational regression trees that additively approximate the functional gradients. The use of multiple additive trees combined with relational representation allows for learning more expressive policies than what has been done before. We demonstrate the usefulness of our approach in several different domains. 1
SixthSense: Fast and reliable recognition of dead ends in MDPs
 In submission
, 2010
"... The results of the latest International Probabilistic Planning Competition (IPPC2008) indicate that the presence of dead ends, states with no trajectory to the goal, makes MDPs hard for modern probabilistic planners. Implicit dead ends, states with executable actions but no path to the goal, are pa ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
(Show Context)
The results of the latest International Probabilistic Planning Competition (IPPC2008) indicate that the presence of dead ends, states with no trajectory to the goal, makes MDPs hard for modern probabilistic planners. Implicit dead ends, states with executable actions but no path to the goal, are particularly challenging; existing MDP solvers spend much time and memory identifying these states. As a first attempt to address this issue, we propose a machine learning algorithm called SIXTHSENSE. SIXTHSENSE helps existing MDP solvers by finding nogoods, conjunctions of literals whose truth in a state implies that the state is a dead end. Importantly, our learned nogoods are sound, and hence the states they identify are true dead ends. SIXTHSENSE is very fast, needs little training data, and takes only a small fraction of total planning time. While IPPC problems may have millions of dead ends, they may typically be represented with only a dozen or two nogoods. Thus, nogood learning efficiently produces a quick and reliable means for deadend recognition. Our experiments show that the nogoods found by SIXTHSENSE routinely reduce planning space and time on IPPC domains, enabling some planners to solve problems they could not previously handle.
Selftaught decision theoretic planning with first order decision diagrams
 In Proceedings of ICAPS10
, 2010
"... We present a new paradigm for planning by learning, where the planner is given a model of the world and a small set of states of interest, but no indication of optimal actions in these states. The additional information can help focus the planner on regions of the state space that are of interest an ..."
Abstract

Cited by 13 (11 self)
 Add to MetaCart
(Show Context)
We present a new paradigm for planning by learning, where the planner is given a model of the world and a small set of states of interest, but no indication of optimal actions in these states. The additional information can help focus the planner on regions of the state space that are of interest and lead to improved performance. We demonstrate this idea by introducing novel modelchecking reduction operations for First Order Decision Diagrams (FODD), a representation that has been used to implement decisiontheoretic planning with Relational Markov Decision Processes (RMDP). Intuitively, these reductions modify the construction of the value function by removing any complex specifications that are irrelevant to the set of training examples, thereby focusing on the region of interest. We show that such training examples can be constructed on the fly from a description of the planning problem thus we can bootstrap to get a selftaught planning system. Additionally, we provide a new heuristic to embed universal and conjunctive goals within the framework of RMDP planners, expanding the scope and applicability of such systems. We show that these ideas lead to significant improvements in performance in terms of both speed and coverage of the planner, yielding state of the art planning performance on problems from the International Planning Competition.
A Heuristic Search Algorithm for Solving FirstOrder MDPs
 In Proc. Conference on Uncertainty in Artificial Intelligence (UAI
, 2005
"... We present a heuristic search algorithm for solving firstorder MDPs (FOMDPs). Our approach combines firstorder state abstraction that avoids evaluating states individually, and heuristic search that avoids evaluating all states. Firstly, we apply state abstraction directly on the FOMDP avoiding pr ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
We present a heuristic search algorithm for solving firstorder MDPs (FOMDPs). Our approach combines firstorder state abstraction that avoids evaluating states individually, and heuristic search that avoids evaluating all states. Firstly, we apply state abstraction directly on the FOMDP avoiding propositionalization. Such kind of abstraction is referred to as firstorder state abstraction. Secondly, guided by an admissible heuristic, the search is restricted only to those states that are reachable from the initial state. We demonstrate the usefullness of the above techniques for solving FOMDPs on a system, referred to as FCPlanner, that entered the probabilistic track of the International Planning Competition (IPC’2004). 1