Results 1 -
8 of
8
Modelling Relational Statistics With Bayes Nets
"... Abstract. Class-level models capture relational statistics over object attributes and their connecting links, answering questions such as “what is the percentage of friendship pairs where both friends are women?” Class-level relationships are important in themselves, and they support applications li ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract. Class-level models capture relational statistics over object attributes and their connecting links, answering questions such as “what is the percentage of friendship pairs where both friends are women?” Class-level relationships are important in themselves, and they support applications like policy making, strategic planning, and query optimization. We represent class statistics using Parametrized Bayes Nets (PBNs), a first-order logic extension of Bayes nets. Queries about classes require a new semantics for PBNs, as the standard grounding semantics is only appropriate for answering queries about specific ground facts. We propose a novel random selection semantics for PBNs, which does not make reference to a ground model, and supports class-level queries. The parameters for this semantics can be learned using the recent pseudo-likelihood measure [1] as the objective function. This objective function is maximized by taking the empirical frequencies in the relational data as the parameter settings. We render the computation of these empirical frequencies tractable in the presence of negated relations by the inverse Möbius transform. Evaluation of our method on four benchmark datasets shows that maximum pseudo-likelihood provides fast and accurate estimates at different sample sizes. 1
Simple Decision Forests for Multi-Relational Classification
"... An important task in multi-relational data mining is link-based classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independe ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
An important task in multi-relational data mining is link-based classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independence assumption to the e↵ect that information from di↵erent data tables is independent given the class label. The independence assumption entails a closed-form formula for combining probabilistic predictions based on decision trees learned on di↵erent database tables. Logistic regression learns di↵erent weights for information from di↵erent tables and prunes irrelevant tables. In experiments, learning was very fast with competitive accuracy.
A Hierarchy of Independence Assumptions for Multi-relational Bayes Net Classifiers
"... Abstract—Many databases store data in relational format, with different types of entities and information about their attributes and links between the entities. Link-based classification (LBC) is the problem of predicting the class attribute of a target entity given the attributes of entities linked ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Many databases store data in relational format, with different types of entities and information about their attributes and links between the entities. Link-based classification (LBC) is the problem of predicting the class attribute of a target entity given the attributes of entities linked to it. In this paper we propose a new relational Bayes net classifier method for LBC, which assumes that different links of an object are independently drawn from the same distribution, given attribute information from the linked tables. We show that this assumption allows very fast multi-relational Bayes net learning. We define three more independence assumptions for LBC to unify proposals from different researchers in a single novel hierarchy. Our proposed model is at the top and the wellknown multi-relational Naive Bayes classifier is at the bottom of this hierarchy. The model in each level of the hierarchy uses a new independence assumption in addition to the assumptions used in the higher levels. In experiments on four benchmark datasets, our proposed link independence model has the best predictive accuracy compared to the hierarchy models and a variety of relational classifiers.
Computing Multi-Relational Sufficient Statistics for Large Databases
"... Databases contain information about which relationships do and do not hold among entities. To make this infor-mation accessible for statistical analysis requires computing sucient statistics that combine information from di↵erent database tables. Such statistics may involve any number of positive an ..."
Abstract
- Add to MetaCart
(Show Context)
Databases contain information about which relationships do and do not hold among entities. To make this infor-mation accessible for statistical analysis requires computing sucient statistics that combine information from di↵erent database tables. Such statistics may involve any number of positive and negative relationships. With a naive enumer-ation approach, computing sucient statistics for negative relationships is feasible only for small databases. We solve this problem with a new dynamic programming algorithm that performs a virtual join, where the requisite counts are computed without materializing join tables. Contingency table algebra is a new extension of relational algebra, that facilitates the ecient implementation of this Möbius vir-tual join operation. The Möbius Join scales to large datasets (over 1M tuples) with complex schemas. Empirical evalu-ation with seven benchmark datasets showed that informa-tion about the presence and absence of links can be exploited in feature selection, association rule mining, and Bayesian network learning.
Fast Learning of Relational Dependency Networks
"... Abstract. A Relational Dependency Network (RDN) is a directed graph-ical model widely used for multi-relational data. These networks allow cyclic dependencies, necessary to represent relational autocorrelations. We describe an approach for learning both the RDN’s structure and its parameters, given ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. A Relational Dependency Network (RDN) is a directed graph-ical model widely used for multi-relational data. These networks allow cyclic dependencies, necessary to represent relational autocorrelations. We describe an approach for learning both the RDN’s structure and its parameters, given an input relational database: First learn a Bayesian network (BN), then transform the Bayesian network to an RDN. Thus fast Bayes net learning can provide fast RDN learning. The BN-to-RDN transform comprises a simple, local adjustment of the Bayes net structure and a closed-form transform of the Bayes net parameters. This method can learn an RDN for a dataset with a million tuples in minutes. We em-pirically compare our approach to state-of-the art RDN learning methods that use functional gradient boosting, on five benchmark datasets. Learn-ing RDNs via BNs scales much better to large datasets than learning RDNs with boosting, and provides competitive accuracy in predictions. 1
Learning Bayes Nets for Relational Data with Link Uncertainty
"... Abstract. We present an algorithm for learning correlations among link types and node attributes in relational data that represent complex net-works. The link correlations are represented in a Bayes net structure. This provides a succinct graphical way to display relational statisti-cal patterns and ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We present an algorithm for learning correlations among link types and node attributes in relational data that represent complex net-works. The link correlations are represented in a Bayes net structure. This provides a succinct graphical way to display relational statisti-cal patterns and support powerful probabilistic inferences. The current state of the art algorithm for learning relational Bayes nets captures only correlations among entity attributes given the existence of links among entities. The models described in this paper capture a wider class of cor-relations that involve uncertainty about the link structure. Our base line method learns a Bayes net from join tables directly. This is a statisti-cally powerful procedure that finds many correlations, but does not scale well to larger datasets. We compare join table search with a hierarchical search strategy. 1
Identifying Important Nodes in Heterogenous Networks
"... This is a position paper that presents a new approach to identifying important nodes or entities in a complex heterogeneous network. We provide a novel definition of an importance score based on a statistical model: An individual is important to the extent that including an individual explicitly in ..."
Abstract
- Add to MetaCart
(Show Context)
This is a position paper that presents a new approach to identifying important nodes or entities in a complex heterogeneous network. We provide a novel definition of an importance score based on a statistical model: An individual is important to the extent that including an individual explicitly in the model improves the data fit of the model more than it increases the model’s com-plexity. We apply techniques from statistical-relational learning, a recent field that combines AI and machine learning, to identify statistically important individuals in a scalable manner. We investigate empirically our ap-proach with the OPTA soccer data set for the English premier league.
A Proposal for Statistical Outlier Detection in Relational Structures
"... This paper extends unsupervised statistical outlier detection to the case of relational data. For nonrelational data, where each individual is characterized by a feature vector, a common approach starts with learning a generative statistical model for the population. The model assigns a likelihood m ..."
Abstract
- Add to MetaCart
(Show Context)
This paper extends unsupervised statistical outlier detection to the case of relational data. For nonrelational data, where each individual is characterized by a feature vector, a common approach starts with learning a generative statistical model for the population. The model assigns a likelihood measure for the feature vector that characterizes the individual; the lower the feature vector likelihood, the more anomalous the individual. A difference between relational and nonrelational data is that an individual is characterized not only by a list of attributes, but also by its links and by attributes of the individuals linked to it. We refer to a relational structure that specifies this information for a specific individual as the individual’s database. Our proposal is to use the likelihood assigned by a generative model to the individual’s database as the anomaly score for the individual; the lower the model likelihood, the more anomalous the individual. As a novel validation method, we compare the model likelihood with metrics of individual success. An empirical evaluation reveals a surprising finding in soccer and movie data: We observe in the data a strong correlation between the likelihood and success metrics.