Results 1 - 10
of
23
First order decision diagrams for relational MDPs
- In Proceedings of the International Joint Conference of Artificial Intelligence
, 2007
"... Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational struc ..."
Abstract
-
Cited by 26 (10 self)
- Add to MetaCart
(Show Context)
Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy. 1.
Practical solution techniques for first-order mdps
- Artificial Intelligence
"... Many traditional solution approaches to relationally specified decision-theoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approa ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
(Show Context)
Many traditional solution approaches to relationally specified decision-theoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approach directly to the resulting ground Markov decision process (MDP). Unfortunately, the space and time complexity of these grounded solution approaches are polynomial in the number of domain objects and exponential in the predicate arity and the number of nested quantifiers in the relational problem specification. An alternative to grounding a relational planning problem is to tackle the problem directly at the relational level. In this article, we propose one such approach that translates an expressive subset of the PPDDL representation to a first-order MDP (FOMDP) specification and then derives a domain-independent policy without grounding at any intermediate step. However, such generality does not come without its own set of challenges—the purpose of this article is to explore practical solution techniques for solving FOMDPs. To demonstrate the applicability of our techniques, we present proof-of-concept results of our first-order approximate linear programming (FOALP) planner on problems from the probabilistic track
Self-taught decision theoretic planning with first order decision diagrams
- In Proceedings of ICAPS-10
, 2010
"... We present a new paradigm for planning by learning, where the planner is given a model of the world and a small set of states of interest, but no indication of optimal actions in these states. The additional information can help focus the planner on regions of the state space that are of interest an ..."
Abstract
-
Cited by 14 (12 self)
- Add to MetaCart
We present a new paradigm for planning by learning, where the planner is given a model of the world and a small set of states of interest, but no indication of optimal actions in these states. The additional information can help focus the planner on regions of the state space that are of interest and lead to improved performance. We demonstrate this idea by introducing novel model-checking reduction operations for First Order Decision Diagrams (FODD), a representation that has been used to implement decision-theoretic planning with Relational Markov Decision Processes (RMDP). Intuitively, these reductions modify the construction of the value function by removing any complex specifications that are irrelevant to the set of training examples, thereby focusing on the region of interest. We show that such training examples can be constructed on the fly from a description of the planning problem thus we can bootstrap to get a self-taught planning system. Additionally, we provide a new heuristic to embed universal and conjunctive goals within the framework of RMDP planners, expanding the scope and applicability of such systems. We show that these ideas lead to significant improvements in performance in terms of both speed and coverage of the planner, yielding state of the art planning performance on problems from the International Planning Competition.
Stochastic planning with first order decision diagrams
- In ICAPS
, 2008
"... Dynamic programming algorithms have been successfully applied to propositional stochastic planning problems by using compact representations, in particular algebraic decision diagrams, to capture domain dynamics and value functions. Work on symbolic dynamic programming lifted these ideas to first or ..."
Abstract
-
Cited by 10 (10 self)
- Add to MetaCart
Dynamic programming algorithms have been successfully applied to propositional stochastic planning problems by using compact representations, in particular algebraic decision diagrams, to capture domain dynamics and value functions. Work on symbolic dynamic programming lifted these ideas to first order logic using several representation schemes. Recent work introduced a first order variant of decision diagrams (FODD) and developed a value iteration algorithm for this representation. This paper develops several improvements to the FODD algorithm that make the approach practical. These include, new reduction operators that decrease the size of the representation, several speedup techniques, and techniques for value approximation. Incorporating these, the paper presents a planning system, FODD-PLANNER, for solving relational stochastic planning problems. The system is evaluated on several domains, including problems from the recent international planning competition, and shows competitive performance with top ranking systems. This is the first demonstration of feasibility of this approach and it shows that abstraction through compact representation is a promising approach to stochastic planning.
Exploration in Relational Domains for Model-based Reinforcement Learning
"... A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of model-based reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E 3 and R-MAX algorithms. Efficien ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of model-based reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E 3 and R-MAX algorithms. Efficient exploration in exponentially large state spaces needs to exploit the generalization of the learned model: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be a well-known context in which exploitation is promising. To address this we introduce relational count functions which generalize the classical notion of state and action visitation counts. We provide guarantees on the exploration efficiency of our framework using count functions under the assumption that we had a relational KWIK learner and a near-optimal planner. We propose a concrete exploration algorithm which integrates a practically efficient probabilistic rule learner and a relational planner (for which there are no guarantees, however) and employs the contexts of learned relational rules as features to model the novelty of states and actions. Our results in noisy 3D simulated robot manipulation problems and in domains of the international planning competition demonstrate that our approach is more effective than existing propositional and factored exploration techniques.
Generalized First Order Decision Diagrams for First Order Markov Decision Processes
"... First order decision diagrams (FODD) were recently introduced as a compact knowledge representation expressing functions over relational structures. FODDs represent numerical functions that, when constrained to the Boolean range, use only existential quantification. Previous work developed a set of ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
First order decision diagrams (FODD) were recently introduced as a compact knowledge representation expressing functions over relational structures. FODDs represent numerical functions that, when constrained to the Boolean range, use only existential quantification. Previous work developed a set of operations over FODDs, showed how they can be used to solve relational Markov decision processes (RMDP) using dynamic programming algorithms, and demonstrated their success in solving stochastic planning problems from the International Planning Competition in the system FODD-Planner. A crucial ingredient of this scheme is a set of operations to remove redundancy in decision diagrams, thus keeping them compact. This paper makes three contributions. First, we introduce Generalized FODDs (GFODD) and combination algorithms for them, generalizing FODDs to arbitrary quantification. Second, we show how GFODDs can be used in principle to solve RMDPs with arbitrary quantification, and develop a particularly promising case where an arbitrary number of existential quantifiers is followed by an arbitrary number of universal quantifiers. Third, we develop a new approach to reduce FODDs and GFODDs using model checking. This yields a reduction that is complete for FODDs and provides a sound reduction procedure for GFODDs. 1
Generating optimal plans in highly-dynamic domains
- In Proceedings of the Conference on Uncertainty in Artificial Intelligence
"... Generating optimal plans in highly dynamic en-vironments is challenging. Plans are predicated on an assumed initial state, but this state can change unexpectedly during plan generation, po-tentially invalidating the planning effort. In this paper we make three contributions: (1) We propose a novel a ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Generating optimal plans in highly dynamic en-vironments is challenging. Plans are predicated on an assumed initial state, but this state can change unexpectedly during plan generation, po-tentially invalidating the planning effort. In this paper we make three contributions: (1) We propose a novel algorithm for generating op-timal plans in settings where frequent, unex-pected events interfere with planning. It is able to quickly distinguish relevant from irrelevant state changes, and to update the existing plan-ning search tree if necessary. (2) We argue for a new criterion for evaluating plan adaptation tech-niques: the relative running time compared to the “size ” of changes. This is significant since dur-ing recovery more changes may occur that need to be recovered from subsequently, and in order for this process of repeated recovery to terminate, recovery time has to converge. (3) We show em-pirically that our approach can converge and find optimal plans in environments that would ordi-narily defy planning due to their high dynamics. 1
Abstract Planning for Reactive Robots
"... Abstract — Hybrid reactive-deliberative architectures in robotics combine reactive sub-policies for fast action execution with goal sequencing and deliberation. The need for replanning, however, presents a challenge for reactivity and hinders the potential for guarantees about the plan quality. In t ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
Abstract — Hybrid reactive-deliberative architectures in robotics combine reactive sub-policies for fast action execution with goal sequencing and deliberation. The need for replanning, however, presents a challenge for reactivity and hinders the potential for guarantees about the plan quality. In this paper, we argue that one can integrate abstract planning provided by symbolic dynamic programming in first order logic into a reactive robotic architecture, and that such an integration is in fact natural and has advantages over traditional approaches. In particular, it allows the integrated system to spend off-line time planning for a policy, and then use the policy reactively in open worlds, in situations with unexpected outcomes, and even in new environments, all by simply reacting to a state change executing a new action proposed by the policy. We demonstrate the viability of the approach by integrating the FODD-Planner with the robotic DIARC architecture showing how an appropriate interface can be defined and that this integration can yield robust goal-based action execution on robots in open worlds. I.
Probabilistic relational planning with first order decision diagrams
- JAIR
, 2011
"... Dynamic programming algorithms have been successfully applied to propositional stochastic planning problems by using compact representations, in particular algebraic decision diagrams, to capture domain dynamics and value functions. Work on symbolic dynamic programming lifted these ideas to first or ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
(Show Context)
Dynamic programming algorithms have been successfully applied to propositional stochastic planning problems by using compact representations, in particular algebraic decision diagrams, to capture domain dynamics and value functions. Work on symbolic dynamic programming lifted these ideas to first order logic using several representation schemes. Recent work introduced a first order variant of decision diagrams (FODD) and developed a value iteration algorithm for this representation. This paper develops several improvements to the FODD algorithm that make the approach practical. These include, new reduction operators that decrease the size of the representation, several speedup techniques, and techniques for value approximation. Incorporating these, the paper presents a planning system, FODD-Planner, for solving relational stochastic planning problems. The system is evaluated on several domains, including problems from the recent international planning competition, and shows competitive performance with top ranking systems. This is the first demonstration of feasibility of this approach and it shows that abstraction through compact representation is a promising approach to stochastic planning. 1.
Learning Models of Relational MDPs using Graph Kernels
"... Abstract. Relational reinforcement learning is the application of reinforcement learning to structured state descriptions. Model-based methods learn a policy based on a known model that comprises a description of the actions and their effects as well as the reward function. If the model is initially ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Relational reinforcement learning is the application of reinforcement learning to structured state descriptions. Model-based methods learn a policy based on a known model that comprises a description of the actions and their effects as well as the reward function. If the model is initially unknown, one might learn the model first and then apply the model-based method (indirect reinforcement learning). In this paper, we propose a method for model-learning that is based on a combination of several SVMs using graph kernels. Indeterministic processes can be dealt with by combining the kernel approach with a clustering technique. We demonstrate the validity of the approach by a range of experiments on various Blocksworld scenarios. 1