Results 1 
8 of
8
Modelling Relational Statistics With Bayes Nets
"... Abstract. Classlevel models capture relational statistics over object attributes and their connecting links, answering questions such as “what is the percentage of friendship pairs where both friends are women?” Classlevel relationships are important in themselves, and they support applications li ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Classlevel models capture relational statistics over object attributes and their connecting links, answering questions such as “what is the percentage of friendship pairs where both friends are women?” Classlevel relationships are important in themselves, and they support applications like policy making, strategic planning, and query optimization. We represent class statistics using Parametrized Bayes Nets (PBNs), a firstorder logic extension of Bayes nets. Queries about classes require a new semantics for PBNs, as the standard grounding semantics is only appropriate for answering queries about specific ground facts. We propose a novel random selection semantics for PBNs, which does not make reference to a ground model, and supports classlevel queries. The parameters for this semantics can be learned using the recent pseudolikelihood measure [1] as the objective function. This objective function is maximized by taking the empirical frequencies in the relational data as the parameter settings. We render the computation of these empirical frequencies tractable in the presence of negated relations by the inverse Möbius transform. Evaluation of our method on four benchmark datasets shows that maximum pseudolikelihood provides fast and accurate estimates at different sample sizes. 1
Simple Decision Forests for MultiRelational Classification
"... An important task in multirelational data mining is linkbased classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independe ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
An important task in multirelational data mining is linkbased classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independence assumption to the e↵ect that information from di↵erent data tables is independent given the class label. The independence assumption entails a closedform formula for combining probabilistic predictions based on decision trees learned on di↵erent database tables. Logistic regression learns di↵erent weights for information from di↵erent tables and prunes irrelevant tables. In experiments, learning was very fast with competitive accuracy.
A Hierarchy of Independence Assumptions for Multirelational Bayes Net Classifiers
"... Abstract—Many databases store data in relational format, with different types of entities and information about their attributes and links between the entities. Linkbased classification (LBC) is the problem of predicting the class attribute of a target entity given the attributes of entities linked ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Many databases store data in relational format, with different types of entities and information about their attributes and links between the entities. Linkbased classification (LBC) is the problem of predicting the class attribute of a target entity given the attributes of entities linked to it. In this paper we propose a new relational Bayes net classifier method for LBC, which assumes that different links of an object are independently drawn from the same distribution, given attribute information from the linked tables. We show that this assumption allows very fast multirelational Bayes net learning. We define three more independence assumptions for LBC to unify proposals from different researchers in a single novel hierarchy. Our proposed model is at the top and the wellknown multirelational Naive Bayes classifier is at the bottom of this hierarchy. The model in each level of the hierarchy uses a new independence assumption in addition to the assumptions used in the higher levels. In experiments on four benchmark datasets, our proposed link independence model has the best predictive accuracy compared to the hierarchy models and a variety of relational classifiers.
Computing MultiRelational Sufficient Statistics for Large Databases
"... Databases contain information about which relationships do and do not hold among entities. To make this information accessible for statistical analysis requires computing sucient statistics that combine information from di↵erent database tables. Such statistics may involve any number of positive an ..."
Abstract
 Add to MetaCart
(Show Context)
Databases contain information about which relationships do and do not hold among entities. To make this information accessible for statistical analysis requires computing sucient statistics that combine information from di↵erent database tables. Such statistics may involve any number of positive and negative relationships. With a naive enumeration approach, computing sucient statistics for negative relationships is feasible only for small databases. We solve this problem with a new dynamic programming algorithm that performs a virtual join, where the requisite counts are computed without materializing join tables. Contingency table algebra is a new extension of relational algebra, that facilitates the ecient implementation of this Möbius virtual join operation. The Möbius Join scales to large datasets (over 1M tuples) with complex schemas. Empirical evaluation with seven benchmark datasets showed that information about the presence and absence of links can be exploited in feature selection, association rule mining, and Bayesian network learning.
Fast Learning of Relational Dependency Networks
"... Abstract. A Relational Dependency Network (RDN) is a directed graphical model widely used for multirelational data. These networks allow cyclic dependencies, necessary to represent relational autocorrelations. We describe an approach for learning both the RDN’s structure and its parameters, given ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. A Relational Dependency Network (RDN) is a directed graphical model widely used for multirelational data. These networks allow cyclic dependencies, necessary to represent relational autocorrelations. We describe an approach for learning both the RDN’s structure and its parameters, given an input relational database: First learn a Bayesian network (BN), then transform the Bayesian network to an RDN. Thus fast Bayes net learning can provide fast RDN learning. The BNtoRDN transform comprises a simple, local adjustment of the Bayes net structure and a closedform transform of the Bayes net parameters. This method can learn an RDN for a dataset with a million tuples in minutes. We empirically compare our approach to stateofthe art RDN learning methods that use functional gradient boosting, on five benchmark datasets. Learning RDNs via BNs scales much better to large datasets than learning RDNs with boosting, and provides competitive accuracy in predictions. 1
Learning Bayes Nets for Relational Data with Link Uncertainty
"... Abstract. We present an algorithm for learning correlations among link types and node attributes in relational data that represent complex networks. The link correlations are represented in a Bayes net structure. This provides a succinct graphical way to display relational statistical patterns and ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We present an algorithm for learning correlations among link types and node attributes in relational data that represent complex networks. The link correlations are represented in a Bayes net structure. This provides a succinct graphical way to display relational statistical patterns and support powerful probabilistic inferences. The current state of the art algorithm for learning relational Bayes nets captures only correlations among entity attributes given the existence of links among entities. The models described in this paper capture a wider class of correlations that involve uncertainty about the link structure. Our base line method learns a Bayes net from join tables directly. This is a statistically powerful procedure that finds many correlations, but does not scale well to larger datasets. We compare join table search with a hierarchical search strategy. 1
Identifying Important Nodes in Heterogenous Networks
"... This is a position paper that presents a new approach to identifying important nodes or entities in a complex heterogeneous network. We provide a novel definition of an importance score based on a statistical model: An individual is important to the extent that including an individual explicitly in ..."
Abstract
 Add to MetaCart
(Show Context)
This is a position paper that presents a new approach to identifying important nodes or entities in a complex heterogeneous network. We provide a novel definition of an importance score based on a statistical model: An individual is important to the extent that including an individual explicitly in the model improves the data fit of the model more than it increases the model’s complexity. We apply techniques from statisticalrelational learning, a recent field that combines AI and machine learning, to identify statistically important individuals in a scalable manner. We investigate empirically our approach with the OPTA soccer data set for the English premier league.
A Proposal for Statistical Outlier Detection in Relational Structures
"... This paper extends unsupervised statistical outlier detection to the case of relational data. For nonrelational data, where each individual is characterized by a feature vector, a common approach starts with learning a generative statistical model for the population. The model assigns a likelihood m ..."
Abstract
 Add to MetaCart
(Show Context)
This paper extends unsupervised statistical outlier detection to the case of relational data. For nonrelational data, where each individual is characterized by a feature vector, a common approach starts with learning a generative statistical model for the population. The model assigns a likelihood measure for the feature vector that characterizes the individual; the lower the feature vector likelihood, the more anomalous the individual. A difference between relational and nonrelational data is that an individual is characterized not only by a list of attributes, but also by its links and by attributes of the individuals linked to it. We refer to a relational structure that specifies this information for a specific individual as the individual’s database. Our proposal is to use the likelihood assigned by a generative model to the individual’s database as the anomaly score for the individual; the lower the model likelihood, the more anomalous the individual. As a novel validation method, we compare the model likelihood with metrics of individual success. An empirical evaluation reveals a surprising finding in soccer and movie data: We observe in the data a strong correlation between the likelihood and success metrics.