Results 1  10
of
28
Gradientbased boosting for Statistical Relational Learning: The Relational Dependency Network Case
, 2011
"... Abstract. Dependency networks approximate a joint probability distribution over multiple random variables as a product of conditional distributions. Relational Dependency Networks (RDNs) are graphical models that extend dependency networks to relational domains. This higher expressivity, however, co ..."
Abstract

Cited by 39 (17 self)
 Add to MetaCart
Abstract. Dependency networks approximate a joint probability distribution over multiple random variables as a product of conditional distributions. Relational Dependency Networks (RDNs) are graphical models that extend dependency networks to relational domains. This higher expressivity, however, comes at the expense of a more complex modelselection problem: an unbounded number of relational abstraction levels might need to be explored. Whereas current learning approaches for RDNs learn a single probability tree per random variable, we propose to turn the problem into a series of relational functionapproximation problems using gradientbased boosting. In doing so, one can easily induce highly complex features over several iterations and in turn estimate quickly a very expressive model. Our experimental results in several different data sets show that this boosting method results in efficient learning of RDNs when compared to stateoftheart statistical relational learning approaches. 1
Planning with Noisy Probabilistic Relational Rules
"... Noisy probabilistic relational rules are a promising world model representation for several reasons. They are compact and generalize over world instantiations. They are usually interpretable and they can be learned effectively from the action experiences in complex worlds. We investigate reasoning w ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
Noisy probabilistic relational rules are a promising world model representation for several reasons. They are compact and generalize over world instantiations. They are usually interpretable and they can be learned effectively from the action experiences in complex worlds. We investigate reasoning with such rules in grounded relational domains. Our algorithms exploit the compactness of rules for efficient and flexible decisiontheoretic planning. As a first approach, we combine these rules with the Upper Confidence Bounds applied to Trees (UCT) algorithm based on lookahead trees. Our second approach converts these rules into a structured dynamic Bayesian network representation and predicts the effects of action sequences using approximate inference and beliefs over world states. We evaluate the effectiveness of our approaches for planning in a simulated complex 3D robot manipulation scenario with an articulated manipulator and realistic physics and in domains of the probabilistic planning competition. Empirical results show that our methods can solve problems where existing methods fail. 1.
Imitation Learning in Relational Domains: A FunctionalGradient Boosting Approach
"... Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitatio ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitation learning as supervised learning of a function from states to actions. For propositional worlds, functional gradient methods have been proved to be beneficial. They are simpler to implement than most existing methods, more efficient, more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the form of the function. Building on recent generalizations of functional gradient boosting to relational representations, we implement a functional gradient boosting approach to imitation learning in relational domains. In particular, given a set of traces from the human teacher, our system learns a policy in the form of a set of relational regression trees that additively approximate the functional gradients. The use of multiple additive trees combined with relational representation allows for learning more expressive policies than what has been done before. We demonstrate the usefulness of our approach in several different domains. 1
Exploration in Relational Domains for Modelbased Reinforcement Learning
, 2012
"... A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of modelbased reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E 3 and RMAX algorithms. Efficien ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of modelbased reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E 3 and RMAX algorithms. Efficient exploration in exponentially large state spaces needs to exploit the generalization of the learned model: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be a wellknown context in which exploitation is promising. To address this we introduce relational count functions which generalize the classical notion of state and action visitation counts. We provide guarantees on the exploration efficiency of our framework using count functions under the assumption that we had a relational KWIK learner and a nearoptimal planner. We propose a concrete exploration algorithm which integrates a practically efficient probabilistic rule learner and a relational planner (for which there are no guarantees, however) and employs the contexts of learned relational rules as features to model the novelty of states and actions. Our results in noisy 3D simulated robot manipulation problems and in domains of the international planning competition demonstrate that our approach is more effective than existing propositional and factored exploration techniques.
PreferenceBased Policy Iteration: Leveraging Preference Learning for Reinforcement Learning
"... Abstract. This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a “preferencebased” approach to reinforcement learning is a possible extension of the type of feedback an agen ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
Abstract. This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a “preferencebased” approach to reinforcement learning is a possible extension of the type of feedback an agent may learn from. In particular, while conventional RL methods are essentially confined to deal with numerical rewards, there are many applications in which this type of information is not naturally available, and in which only qualitative reward signals are provided instead. Therefore, building on novel methods for preference learning, our general goal is to equip the RL agent with qualitative policy models, such as ranking functions that allow for sorting its available actions from most to least promising, as well as algorithms for learning such models from qualitative feedback. Concretely, in this paper, we build on an existing method for approximate policy iteration based on rollouts. While this approach is based on the use of classification methods for generalization and policy learning, we make use of a specific type of preference learning method called label ranking. Advantages of our preferencebased policy iteration method are illustrated by means of two case studies. 1
Exploration in Relational Worlds
"... Abstract. One of the key problems in modelbased reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large relational domains, in which there is a varying number of objects and relations between them. We provide a solution to exploring large relational ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. One of the key problems in modelbased reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large relational domains, in which there is a varying number of objects and relations between them. We provide a solution to exploring large relational Markov decision processes by developing relational extensions of the concepts of the Explicit Explore or Exploit (E 3) algorithm. A key insight is that the inherent generalization of learnt knowledge in the relational representation has profound implications also on the exploration strategy: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be an instance of a wellknown context in which exploitation is promising. Our experimental evaluation shows the effectiveness and benefit of relational exploration over several propositional benchmark approaches on noisy 3D simulated robot manipulation problems. 1
Automatic Induction of BellmanError Features for Probabilistic Planning
"... Domainspecific features are important in representing problem structure throughout machine learning and decisiontheoretic planning. In planning, once state features are provided, domainindependent algorithms such as approximate value iteration can learn weighted combinations of those features that ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Domainspecific features are important in representing problem structure throughout machine learning and decisiontheoretic planning. In planning, once state features are provided, domainindependent algorithms such as approximate value iteration can learn weighted combinations of those features that often perform well as heuristic estimates of state value (e.g., distance to the goal). Successful applications in realworld domains often require features crafted by human experts. Here, we propose automatic processes for learning useful domainspecific feature sets with little or no human intervention. Our methods select and add features that describe statespace regions of high inconsistency in the Bellman equation (statewise Bellman error) during approximate value iteration. Our method can be applied using any realvaluedfeature hypothesis space and corresponding learning method for selecting features from training sets of statevalue pairs. We evaluate the method with hypothesis spaces defined by both relational and propositional feature languages, using nine probabilistic planning domains. We show that approximate value iteration using a relational feature space performs at the stateoftheart in domainindependent stochastic relational planning. 1.
Statistical relational learning to predict primary myocardial infarction from electronic health records
 AAAI Conference on Innovative Applications in AI (IAAI
, 2012
"... Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SRL algorithm, relational functional gradien ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SRL algorithm, relational functional gradient boosting, outperforms propositional learners particularly in the medicallyrelevant high recall region. We observe that both SRL algorithms predict outcomes better than their propositional analogs and suggest how our methods can augment current epidemiological practices.
Accelerating Imitation Learning in Relational Domains via Transfer by Initialization
"... Abstract. The problem of learning to mimic a human expert/teacher from training trajectories is called imitation learning. To make the process of teaching easier in this setting, we propose to employ transfer learning (where one learns on a source problem and transfers the knowledge to potentially m ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. The problem of learning to mimic a human expert/teacher from training trajectories is called imitation learning. To make the process of teaching easier in this setting, we propose to employ transfer learning (where one learns on a source problem and transfers the knowledge to potentially more complex target problems). We consider multirelational environments such as realtime strategy games and use functionalgradient boosting to capture and transfer the models learned in these environments. Our experiments demonstrate that our learner learns a very good initial model from the simple scenario and effectively transfers the knowledge to the more complex scenario thus achieving a jump start, a steeper learning curve and a higher convergence in performance. 1
Seeing the Forest Despite the Trees: Large Scale SpatialTemporal Decision Making
 PROCEEDINGS OF THE PROCEEDINGS OF THE TWENTYFIFTH CONFERENCE ANNUAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI09)
, 2009
"... We introduce a challenging realworld planning problem where actions must be taken at each location in a spatial area at each point in time. We use forestry planning as the motivating application. In Large Scale SpatialTemporal (LSST) planning problems, the state and action spaces are defined as th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We introduce a challenging realworld planning problem where actions must be taken at each location in a spatial area at each point in time. We use forestry planning as the motivating application. In Large Scale SpatialTemporal (LSST) planning problems, the state and action spaces are defined as the crossproducts of many local state and action spaces spread over a large spatial area such as a city or forest. These problems possess state uncertainty, have complex utility functions involving spatial constraints and we generally must rely on simulations rather than an explicit transition model. We define LSST problems as reinforcement learning problems and present a solution using policy gradients. We compare two different policy formulations: an explicit policy that identifies each location in space and the action to take there; and an abstract policy that defines the proportion of actions to take across all locations in space. We show that the abstract policy is more robust and achieves higher rewards with far fewer parameters than the elementary policy. This abstract policy is also a better fit to the properties that practitioners in LSST problem domains require for such methods to be widely useful. 1