Results 1  10
of
18
Integrating Samplebased Planning and Modelbased Reinforcement Learning
"... Recent advancements in modelbased reinforcement learning have shown that the dynamics of many structured domains (e.g. DBNs) can be learned with tractable sample complexity, despite their exponentially large state spaces. Unfortunately, these algorithms all require access to a planner that computes ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
(Show Context)
Recent advancements in modelbased reinforcement learning have shown that the dynamics of many structured domains (e.g. DBNs) can be learned with tractable sample complexity, despite their exponentially large state spaces. Unfortunately, these algorithms all require access to a planner that computes a near optimal policy, and while many traditional MDP algorithms make this guarantee, their computation time grows with the number of states. We show how to replace these overmatched planners with a class of samplebased planners—whose computation time is independent of the number of states—without sacrificing the sampleefficiency guarantees of the overall learning algorithms. To do so, we define sufficient criteria for a samplebased planner to be used in such a learning system and analyze two popular samplebased approaches from the literature. We also introduce our own samplebased planner, which combines the strategies from these algorithms and still meets the criteria for integration into our learning system. In doing so, we define the first complete RL solution for compactly represented (exponentially sized) state spaces with efficiently learnable dynamics that is both sample efficient and whose computation time does not grow rapidly with the number of states.
Planning with Noisy Probabilistic Relational Rules
"... Noisy probabilistic relational rules are a promising world model representation for several reasons. They are compact and generalize over world instantiations. They are usually interpretable and they can be learned effectively from the action experiences in complex worlds. We investigate reasoning w ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
(Show Context)
Noisy probabilistic relational rules are a promising world model representation for several reasons. They are compact and generalize over world instantiations. They are usually interpretable and they can be learned effectively from the action experiences in complex worlds. We investigate reasoning with such rules in grounded relational domains. Our algorithms exploit the compactness of rules for efficient and flexible decisiontheoretic planning. As a first approach, we combine these rules with the Upper Confidence Bounds applied to Trees (UCT) algorithm based on lookahead trees. Our second approach converts these rules into a structured dynamic Bayesian network representation and predicts the effects of action sequences using approximate inference and beliefs over world states. We evaluate the effectiveness of our approaches for planning in a simulated complex 3D robot manipulation scenario with an articulated manipulator and realistic physics and in domains of the probabilistic planning competition. Empirical results show that our methods can solve problems where existing methods fail. 1.
Approximate Inference for Planning in Stochastic Relational Worlds
"... Relational world models that can be learned from experience in stochastic domains have received significant attention recently. However, efficient planning using these models remains a major issue. We propose to convert learned noisy probabilistic relational rules into a structured dynamic Bayesian ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
Relational world models that can be learned from experience in stochastic domains have received significant attention recently. However, efficient planning using these models remains a major issue. We propose to convert learned noisy probabilistic relational rules into a structured dynamic Bayesian network representation. Predicting the effects of action sequences using approximate inference allows for planning in complex worlds. We evaluate the effectiveness of our approach for online planning in a 3D simulated blocksworld with an articulated manipulator and realistic physics. Empirical results show that our method can solve problems where existing methods fail. 1.
Learning STRIPS Operators from Noisy and Incomplete Observations
"... Agents learning to act autonomously in realworld domains must acquire a model of the dynamics of the domain in which they operate. Learning domain dynamics can be challenging, especially where an agent only has partial access to the world state, and/or noisy external sensors. Even in standard STRIPS ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Agents learning to act autonomously in realworld domains must acquire a model of the dynamics of the domain in which they operate. Learning domain dynamics can be challenging, especially where an agent only has partial access to the world state, and/or noisy external sensors. Even in standard STRIPS domains, existing approaches cannot learn from noisy, incomplete observations typical of realworld domains. We propose a method which learns STRIPS action models in such domains, by decomposing the problem into first learning a transition function between states in the form of a set of classifiers, and then deriving explicit STRIPS rules from the classifiers ’ parameters. We evaluate our approach on simulated standard planning domains from the International Planning Competition, and show that it learns useful domain descriptions from noisy, incomplete observations. 1
Exploration in Relational Domains for Modelbased Reinforcement Learning
"... A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of modelbased reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E 3 and RMAX algorithms. Efficien ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of modelbased reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E 3 and RMAX algorithms. Efficient exploration in exponentially large state spaces needs to exploit the generalization of the learned model: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be a wellknown context in which exploitation is promising. To address this we introduce relational count functions which generalize the classical notion of state and action visitation counts. We provide guarantees on the exploration efficiency of our framework using count functions under the assumption that we had a relational KWIK learner and a nearoptimal planner. We propose a concrete exploration algorithm which integrates a practically efficient probabilistic rule learner and a relational planner (for which there are no guarantees, however) and employs the contexts of learned relational rules as features to model the novelty of states and actions. Our results in noisy 3D simulated robot manipulation problems and in domains of the international planning competition demonstrate that our approach is more effective than existing propositional and factored exploration techniques.
Efficient learning of relational models for sequential decision making
, 2010
"... The explorationexploitation tradeoff is crucial to reinforcementlearning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, nearoptimal behavior in all but a polynomial number of ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
The explorationexploitation tradeoff is crucial to reinforcementlearning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, nearoptimal behavior in all but a polynomial number of timesteps in the agent’s lifetime. In this work, we prove similar results for certain relational representations, primarily a class we call “relational action schemas”. These generalized models allow us to specify state transitions in a compact form, for instance describing the effect of picking up a generic block instead of picking up 10 different specific blocks. We present theoretical results on crucial subproblems in actionschema learning using the KWIK framework, which allows us to characterize the sample efficiency of an agent learning these models in a reinforcementlearning setting. These results are extended in an apprenticeship learning paradigm where and agent has access not only to its environment, but also to a teacher that can demonstrate traces of state/action/state sequences. We show that the class of action schemas that are efficiently learnable in this paradigm is strictly larger than those learnable in the online setting. We link
Generalized orderingsearch for learning directed probabilistic logical models
 Machine Learning
, 2007
"... Abstract. Recently, there has been an increasing interest in directed probabilistic logical models and a variety of languages for describing such models has been proposed. Although many authors provide highlevel arguments to show that in principle models in their language can be learned from data, ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Recently, there has been an increasing interest in directed probabilistic logical models and a variety of languages for describing such models has been proposed. Although many authors provide highlevel arguments to show that in principle models in their language can be learned from data, most of the proposed learning algorithms have not yet been studied in detail. We introduce an algorithm, generalized orderingsearch, to learn both structure and conditional probability distributions (CPDs) of directed probabilistic logical models. The algorithm upgrades the orderingsearch algorithm for Bayesian networks. We use relational probability trees as a representation for the CPDs. We present experiments on blocks world domains, a gene domain and the Cora dataset. 1
Relevance Grounding for Planning in Relational Domains
"... Abstract. Probabilistic relational models are an efficient way to learn and represent the dynamics in realistic environments consisting of many objects. Autonomous intelligent agents that ground this representation for all objects need to plan in exponentially large state spaces and large sets of st ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Probabilistic relational models are an efficient way to learn and represent the dynamics in realistic environments consisting of many objects. Autonomous intelligent agents that ground this representation for all objects need to plan in exponentially large state spaces and large sets of stochastic actions. A key insight for computational efficiency is that successful planning typically involves only a small subset of relevant objects. In this paper, we introduce a probabilistic model to represent planning with subsets of objects and provide a definition of object relevance. Our definition is sufficient to prove consistency between repeated planning in partially grounded models restricted to relevant objects and planning in the fully grounded model. We propose an algorithm that exploits object relevance to plan efficiently in complex domains. Empirical results in a simulated 3D blocksworld with an articulated manipulator and realistic physics prove the effectiveness of our approach. 1
Probabilistic relational planning with first order decision diagrams
 JAIR
, 2011
"... Dynamic programming algorithms have been successfully applied to propositional stochastic planning problems by using compact representations, in particular algebraic decision diagrams, to capture domain dynamics and value functions. Work on symbolic dynamic programming lifted these ideas to first or ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
Dynamic programming algorithms have been successfully applied to propositional stochastic planning problems by using compact representations, in particular algebraic decision diagrams, to capture domain dynamics and value functions. Work on symbolic dynamic programming lifted these ideas to first order logic using several representation schemes. Recent work introduced a first order variant of decision diagrams (FODD) and developed a value iteration algorithm for this representation. This paper develops several improvements to the FODD algorithm that make the approach practical. These include, new reduction operators that decrease the size of the representation, several speedup techniques, and techniques for value approximation. Incorporating these, the paper presents a planning system, FODDPlanner, for solving relational stochastic planning problems. The system is evaluated on several domains, including problems from the recent international planning competition, and shows competitive performance with top ranking systems. This is the first demonstration of feasibility of this approach and it shows that abstraction through compact representation is a promising approach to stochastic planning. 1.
Learning Models of Relational MDPs using Graph Kernels
"... Abstract. Relational reinforcement learning is the application of reinforcement learning to structured state descriptions. Modelbased methods learn a policy based on a known model that comprises a description of the actions and their effects as well as the reward function. If the model is initially ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Relational reinforcement learning is the application of reinforcement learning to structured state descriptions. Modelbased methods learn a policy based on a known model that comprises a description of the actions and their effects as well as the reward function. If the model is initially unknown, one might learn the model first and then apply the modelbased method (indirect reinforcement learning). In this paper, we propose a method for modellearning that is based on a combination of several SVMs using graph kernels. Indeterministic processes can be dealt with by combining the kernel approach with a clustering technique. We demonstrate the validity of the approach by a range of experiments on various Blocksworld scenarios. 1