Results 1  10
of
37
Solving factored MDPs with hybrid state and action variables
 J. Artif. Intell. Res. (JAIR
"... Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model tha ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
(Show Context)
Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scaleup potential on several hybrid optimization problems. 1.
NonParametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains
"... Policy gradient approaches are a powerful instrument for learning how to interact with the environment. Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult – if not impossible – to apply them within structured domains, ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
(Show Context)
Policy gradient approaches are a powerful instrument for learning how to interact with the environment. Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult – if not impossible – to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a nonparametric policy gradient approach – called NPPG – that overcomes this limitation. The key idea is to apply Friedmann’s gradient boosting: policies are represented as a weighted sum of regression models grown in an stagewise optimization. Employing offtheshelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results. 1.
Practical linear valueapproximation techniques for firstorder MDPs
 Proc. of the Conference on Uncertainty in Artificial Intelligence
, 2006
"... Recent work on approximate linear programming (ALP) techniques for firstorder Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of firstorder basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantag ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
(Show Context)
Recent work on approximate linear programming (ALP) techniques for firstorder Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of firstorder basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantage that it does not require simplification of the firstorder value function, and allows one to solve FOMDPs independent of a specific domain instantiation. In this paper, we address several questions to enhance the applicability of this work: (1) Can we extend the firstorder ALP framework to approximate policy iteration and if so, how do these two algorithms compare? (2) Can we automatically generate basis functions and evaluate their impact on value function quality? (3) How can we decompose intractable problems with universally quantified rewards into tractable subproblems? We propose answers to these questions along with a number of novel optimizations and provide a comparative empirical evaluation on problems from the ICAPS 2004 Probabilistic Planning Competition. 1
First order decision diagrams for relational MDPs
 In Proceedings of the International Joint Conference of Artificial Intelligence
, 2007
"... Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational struc ..."
Abstract

Cited by 26 (10 self)
 Add to MetaCart
(Show Context)
Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy. 1.
Practical solution techniques for firstorder mdps
 Artificial Intelligence
"... Many traditional solution approaches to relationally specified decisiontheoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approa ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Many traditional solution approaches to relationally specified decisiontheoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approach directly to the resulting ground Markov decision process (MDP). Unfortunately, the space and time complexity of these grounded solution approaches are polynomial in the number of domain objects and exponential in the predicate arity and the number of nested quantifiers in the relational problem specification. An alternative to grounding a relational planning problem is to tackle the problem directly at the relational level. In this article, we propose one such approach that translates an expressive subset of the PPDDL representation to a firstorder MDP (FOMDP) specification and then derives a domainindependent policy without grounding at any intermediate step. However, such generality does not come without its own set of challenges—the purpose of this article is to explore practical solution techniques for solving FOMDPs. To demonstrate the applicability of our techniques, we present proofofconcept results of our firstorder approximate linear programming (FOALP) planner on problems from the probabilistic track
Approximate solution techniques for factored firstorder MDPs
 In (ICAPS07), 288
, 2007
"... Most traditional approaches to probabilistic planning in relationally specified MDPs rely on grounding the problem w.r.t. specific domain instantiations, thereby incurring a combinatorial blowup in the representation. An alternative approach is to lift a relational MDP to a firstorder MDP (FOMDP) sp ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Most traditional approaches to probabilistic planning in relationally specified MDPs rely on grounding the problem w.r.t. specific domain instantiations, thereby incurring a combinatorial blowup in the representation. An alternative approach is to lift a relational MDP to a firstorder MDP (FOMDP) specification and develop solution approaches that avoid grounding. Unfortunately, stateoftheart FOMDPs are inadequate for specifying factored transition models or additive rewards that scale with the domain size—structure that is very natural in probabilistic planning problems. To remedy these deficiencies, we propose an extension of the FOMDP formalism known as a factored FOMDP and present generalizations of symbolic dynamic programming and linearvalue approximation solutions to exploit its structure. Along the way, we also make contributions to the field of firstorder probabilistic inference (FOPI) by demonstrating novel firstorder structures that can be exploited without domain grounding. We present empirical results to demonstrate that we can obtain solutions whose complexity scales polynomially in the logarithm of the domain size—results that are impossible to obtain with any previously proposed solution method.
Factored value iteration converges
 Acta Cyb
"... Abstract. In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the leastsquares projection operator is modified ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the leastsquares projection operator is modified so that it does not increase maxnorm, and thus preserves convergence. The other modification is that we uniformly sample polynomially many samples from the (exponentially large) state space. This way, the complexity of our algorithm becomes polynomial in the size of the fMDP description length. We prove that the algorithm is convergent. We also derive an upper bound on the difference between our approximate solution and the optimal one, and also on the error introduced by sampling. We analyze various projection operators with respect to their computation complexity and their convergence when combined with approximate value iteration. factored Markov decision process, value iteration, reinforcement learning 1.
Firstorder decisiontheoretic planning in structured relational environments
, 2008
"... We consider the general framework of firstorder decisiontheoretic planning in structured relational environments. Most traditional solution approaches to these planning problems ground the relational specification w.r.t. a specific domain instantiation and apply a solution approach directly to the ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We consider the general framework of firstorder decisiontheoretic planning in structured relational environments. Most traditional solution approaches to these planning problems ground the relational specification w.r.t. a specific domain instantiation and apply a solution approach directly to the resulting ground Markov decision process (MDP). Unfortunately, the space and time complexity of these solution algorithms scale linearly with the domain size in the best case and exponentially in the worst case. An alternate approach to grounding a relational planning problem is to lift it to a firstorder MDP (FOMDP) specification. This FOMDP can then be solved directly, resulting in a domainindependent solution whose space and time complexity either do not scale with domain size or can scale sublinearly in the domain size. However, such generality does not come without its own set of challenges and the first purpose of this thesis is to explore exact and approximate solution techniques for practically solving FOMDPs. The second purpose of this thesis is to extend the FOMDP specification to succinctly capture factored actions and additive rewards while extending the exact and approximate solution techniques to directly exploit this structure. In addition, we provide a proof of correctness of the firstorder symbolic dynamic programming approach w.r.t. its wellstudied ground MDP
Efficient learning of relational models for sequential decision making
, 2010
"... The explorationexploitation tradeoff is crucial to reinforcementlearning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, nearoptimal behavior in all but a polynomial number of ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
The explorationexploitation tradeoff is crucial to reinforcementlearning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, nearoptimal behavior in all but a polynomial number of timesteps in the agent’s lifetime. In this work, we prove similar results for certain relational representations, primarily a class we call “relational action schemas”. These generalized models allow us to specify state transitions in a compact form, for instance describing the effect of picking up a generic block instead of picking up 10 different specific blocks. We present theoretical results on crucial subproblems in actionschema learning using the KWIK framework, which allows us to characterize the sample efficiency of an agent learning these models in a reinforcementlearning setting. These results are extended in an apprenticeship learning paradigm where and agent has access not only to its environment, but also to a teacher that can demonstrate traces of state/action/state sequences. We show that the class of action schemas that are efficiently learnable in this paradigm is strictly larger than those learnable in the online setting. We link
Policy Iteration for Relational MDPs
, 2007
"... Relational Markov Decision Processes are a useful ..."