Results 1  10
of
37
Learning Markov Logic Networks via Functional Gradient Boosting
"... Abstract—Recent years have seen a surge of interest in Statistical Relational Learning (SRL) models that combine logic with probabilities. One prominent example is Markov Logic Networks (MLNs). While MLNs are indeed highly expressive, this expressiveness comes at a cost. Learning MLNs is a hard prob ..."
Abstract

Cited by 29 (9 self)
 Add to MetaCart
(Show Context)
Abstract—Recent years have seen a surge of interest in Statistical Relational Learning (SRL) models that combine logic with probabilities. One prominent example is Markov Logic Networks (MLNs). While MLNs are indeed highly expressive, this expressiveness comes at a cost. Learning MLNs is a hard problem and therefore has attracted much interest in the SRL community. Current methods for learning MLNs follow a twostep approach: first, perform a search through the space of possible clauses and then learn appropriate weights for these clauses. We propose to take a different approach, namely to learn both the weights and the structure of the MLN simultaneously. Our approach is based on functional gradient boosting where the problem of learning MLNs is turned into a series of relational functional approximation problems. We use two kinds of representations for the gradients: clausebased and treebased. Our experimental evaluation on several benchmark data sets demonstrates that our new approach can learn MLNs as good or better than those found with stateoftheart methods, but often in a fraction of the time.
Imitation Learning in Relational Domains: A FunctionalGradient Boosting Approach
"... Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitatio ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitation learning as supervised learning of a function from states to actions. For propositional worlds, functional gradient methods have been proved to be beneficial. They are simpler to implement than most existing methods, more efficient, more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the form of the function. Building on recent generalizations of functional gradient boosting to relational representations, we implement a functional gradient boosting approach to imitation learning in relational domains. In particular, given a set of traces from the human teacher, our system learns a policy in the form of a set of relational regression trees that additively approximate the functional gradients. The use of multiple additive trees combined with relational representation allows for learning more expressive policies than what has been done before. We demonstrate the usefulness of our approach in several different domains. 1
Transforming Graph Data for Statistical Relational Learning
, 2012
"... Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In th ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In this article, we examine and categorize techniques for transforming graphbased relational data to improve SRL algorithms. In particular, appropriate transformations of the nodes, links, and/or features of the data can dramatically affect the capabilities and results of SRL algorithms. We introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. More specifically, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.
Unachievable Region in PrecisionRecall Space and Its Effect on Empirical Evaluation
"... Precisionrecall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR curves vary as class skew changes. What was no ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Precisionrecall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR curves vary as class skew changes. What was not recognized before this paper is that there is a region of PR space that is completely unachievable, and the size of this region depends only on the skew. This paper precisely characterizes the size of that region and discusses its implications for empirical evaluation methodology in machine learning. 1.
Area Under the PrecisionRecall Curve: Point Estimates and Confidence Intervals
"... Abstract. The area under the precisionrecall curve (AUCPR) is a single number summary of the information in the precisionrecall (PR) curve. Similar to the receiver operating characteristic curve, the PR curve has its own unique properties that make estimating its enclosed area challenging. Beside ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The area under the precisionrecall curve (AUCPR) is a single number summary of the information in the precisionrecall (PR) curve. Similar to the receiver operating characteristic curve, the PR curve has its own unique properties that make estimating its enclosed area challenging. Besides a point estimate of the area, an interval estimate is often required to express magnitude and uncertainty. In this paper we perform a computational analysis of common AUCPR estimators and their confidence intervals. We find both satisfactory estimates and invalid procedures and we recommend two simple intervals that are robust to a variety of assumptions. 1
Lifted Online Training of Relational Models with Stochastic Gradient Methods
"... Abstract. Lifted inference approaches have rendered large, previously intractable probabilistic inference problems quickly solvable by employing symmetries to handle whole sets of indistinguishable random variables. Still, in many if not most situations training relational models will not benefit fr ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Lifted inference approaches have rendered large, previously intractable probabilistic inference problems quickly solvable by employing symmetries to handle whole sets of indistinguishable random variables. Still, in many if not most situations training relational models will not benefit from lifting: symmetries within models easily break since variables become correlated by virtue of depending asymmetrically on evidence. An appealing idea for such situations is to train and recombine local models. This breaks longrange dependencies and allows to exploit lifting within and across the local training tasks. Moreover, it naturally paves the way for online training for relational models. Specifically, we develop the first lifted stochastic gradient optimization method with gain vector adaptation, which processes each lifted piece one after the other. On several datasets, the resulting optimizer converges to the same quality solution over an order of magnitude faster, simply because unlike batch training it starts optimizing long before having seen the entire megaexample even once. 1
Learning Compact Markov Logic Networks With Decision Trees
 MACHINE LEARNING
"... Statisticalrelational learning combines logical syntax with probabilistic methods. Markov Logic Networks (MLNs) are a prominent model class that generalizes both firstorder logic and undirected graphical models (Markov networks). The qualitative component of an MLN is a set of clauses and the qu ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Statisticalrelational learning combines logical syntax with probabilistic methods. Markov Logic Networks (MLNs) are a prominent model class that generalizes both firstorder logic and undirected graphical models (Markov networks). The qualitative component of an MLN is a set of clauses and the quantitative component is a set of clause weights. Generative MLNs model the joint distribution of relationships and attributes. A stateoftheart structure learning method is the moralization approach: learn a set of directed Horn clauses, then convert them to conjunctions to obtain MLN clauses. The directed clauses are learned using Bayes net methods. The moralization approach takes advantage of the highquality inference algorithms for MLNs and their ability to handle cyclic dependencies. A weakness of moralization is that it leads to an unnecessarily large number of clauses. In this paper we show that using decision trees to represent conditional probabilities in the Bayes net is an effective remedy that leads to much more compact MLN structures. In experiments on benchmark datasets, the decision trees reduce the number of clauses in the moralized MLN by a factor of 525, depending on the dataset. The accuracy of predictions is competitive with the models obtained by standard moralization, and in many cases superior.
Integrating knowledge capture and supervised learning through a humancomputer interface
 In Proc. Fifth Intl. Conf. Knowl. Capture
, 2011
"... Some supervisedlearning algorithms can make effective use of domain knowledge in addition to the inputoutput pairs commonly used in machine learning. However, formulating this additional information often requires an indepth understanding of the specific knowledge representation used by a given le ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Some supervisedlearning algorithms can make effective use of domain knowledge in addition to the inputoutput pairs commonly used in machine learning. However, formulating this additional information often requires an indepth understanding of the specific knowledge representation used by a given learning algorithm. The requirement to use a formal knowledgerepresentation language means that most domain experts will not be able to articulate their expertise, even when a learning algorithm is capable of exploiting such valuable information. We investigate a method to ease this knowledge acquisition through the use of a graphical, humancomputer interface. Our interface allows users to easily provide advice about specific examples, rather than requiring them to provide general rules; we leave the task of properly generalizing such advice to the learning algorithms. We demonstrate the effectiveness of our approach using the Wargus realtime strategy game, comparing learning with no advice to learning with concrete advice provided through our interface, as well as comparing to using generalized advice written by an AI expert. Our results show that our approach of combining a GUIbased advice language with an advicetaking learning algorithm is an effective way to capture domain knowledge.
Learning Relational Probabilistic Models from Partially Observed Data Opening the ClosedWorld Assumption
"... Abstract. Recent years have seen a surge of interest in learning the structure of Statistical Relational Learning (SRL) models that combine logic with probabilities. Most of these models apply the closedworld assumption i.e., whatever is not observed is false in the world. In this work, we consider ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Recent years have seen a surge of interest in learning the structure of Statistical Relational Learning (SRL) models that combine logic with probabilities. Most of these models apply the closedworld assumption i.e., whatever is not observed is false in the world. In this work, we consider the problem of learning the structure of SRL models in the presence of hidden data i.e. we open the closedworld assumption. We develop a functionalgradient boosting algorithm based on EM to learn the structure and parameters of the models simultaneously and apply it to learn different kinds of models – Relational Dependency Networks, Markov Logic Networks and relational policies. Our results in a variety of domains demonstrate that the algorithms can effectively learn with missing data.
An Analysis of How Ensembles of Collective Classifiers Improve Predictions in Graphs
"... We present a theoretical analysis framework that shows how ensembles of collective classifiers can improve predictions for graph data. We show how collective ensemble classification reduces errors due to variance in learning and more interestingly inference. We present an empirical framework that in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We present a theoretical analysis framework that shows how ensembles of collective classifiers can improve predictions for graph data. We show how collective ensemble classification reduces errors due to variance in learning and more interestingly inference. We present an empirical framework that includes various ensemble techniques for classifying relational data using collective inference. The methods span singleand multiplegraph network approaches, and are tested on both synthetic and real world classification tasks. Our experimental results, supported by our theoretical justifications, confirm that ensemble algorithms that explicitly focus onbothlearningandinferenceprocesses andaimatreducing errors associated with both, are the best performers.