• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Learning graphical models for relational data via lattice search (0)

by O Schulte, H Khosravi
Venue:Machine Learning
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Modelling Relational Statistics With Bayes Nets

by Oliver Schulte, Hassan Khosravi, Arthur E. Kirkpatrick, Tianxiang Gao, Yuke Zhu
"... Abstract. Class-level models capture relational statistics over object attributes and their connecting links, answering questions such as “what is the percentage of friendship pairs where both friends are women?” Class-level relationships are important in themselves, and they support applications li ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Abstract. Class-level models capture relational statistics over object attributes and their connecting links, answering questions such as “what is the percentage of friendship pairs where both friends are women?” Class-level relationships are important in themselves, and they support applications like policy making, strategic planning, and query optimization. We represent class statistics using Parametrized Bayes Nets (PBNs), a first-order logic extension of Bayes nets. Queries about classes require a new semantics for PBNs, as the standard grounding semantics is only appropriate for answering queries about specific ground facts. We propose a novel random selection semantics for PBNs, which does not make reference to a ground model, and supports class-level queries. The parameters for this semantics can be learned using the recent pseudo-likelihood measure [1] as the objective function. This objective function is maximized by taking the empirical frequencies in the relational data as the parameter settings. We render the computation of these empirical frequencies tractable in the presence of negated relations by the inverse Möbius transform. Evaluation of our method on four benchmark datasets shows that maximum pseudo-likelihood provides fast and accurate estimates at different sample sizes. 1
(Show Context)

Citation Context

...th the variables are of the appropriate type for the functor. A Parametrized Bayes Net (PBN) is a Bayes net whose nodes are Parametrized random variables [7]. In the remainder of this paper we follow =-=[13]-=- and use the terms functor random variable and functor Bayes Net instead (FBN), for the following reasons. (1) The name “Parametrized” denotes a semantics that views the Bayes net as a template for a ...

Simple Decision Forests for Multi-Relational Classification

by Bahareh Bina, Oliver Schulte, En Crawford, Zhensong Qian, Yi Xiong
"... An important task in multi-relational data mining is link-based classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independe ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
An important task in multi-relational data mining is link-based classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independence assumption to the e↵ect that information from di↵erent data tables is independent given the class label. The independence assumption entails a closed-form formula for combining probabilistic predictions based on decision trees learned on di↵erent database tables. Logistic regression learns di↵erent weights for information from di↵erent tables and prunes irrelevant tables. In experiments, learning was very fast with competitive accuracy.
(Show Context)

Citation Context

...r of more general log-linear models is higher complexity in learning; various studies have shown that scaleable learning is a major challenge for Markov model learning in the multi-relational setting =-=[32, 33, 31]-=-. 6In practice one often needs to perform collective classification: predict the class label of several interrelated entities simultaneously. While collective classification has received much attenti...

A Hierarchy of Independence Assumptions for Multi-relational Bayes Net Classifiers

by Oliver Schulte, Bahareh Bina, En Crawford, Derek Bingham, Yi Xiong
"... Abstract—Many databases store data in relational format, with different types of entities and information about their attributes and links between the entities. Link-based classification (LBC) is the problem of predicting the class attribute of a target entity given the attributes of entities linked ..."
Abstract - Add to MetaCart
Abstract—Many databases store data in relational format, with different types of entities and information about their attributes and links between the entities. Link-based classification (LBC) is the problem of predicting the class attribute of a target entity given the attributes of entities linked to it. In this paper we propose a new relational Bayes net classifier method for LBC, which assumes that different links of an object are independently drawn from the same distribution, given attribute information from the linked tables. We show that this assumption allows very fast multi-relational Bayes net learning. We define three more independence assumptions for LBC to unify proposals from different researchers in a single novel hierarchy. Our proposed model is at the top and the wellknown multi-relational Naive Bayes classifier is at the bottom of this hierarchy. The model in each level of the hierarchy uses a new independence assumption in addition to the assumptions used in the higher levels. In experiments on four benchmark datasets, our proposed link independence model has the best predictive accuracy compared to the hierarchy models and a variety of relational classifiers.
(Show Context)

Citation Context

...on or more. The trade-off for the expressive power of undirected models is higher complexity in learning, especially scalable model learning is a major challenge in the multi-relational setting [16], =-=[29]-=-. Multi-Relational Naive Bayes. There has been extensive research into multi-relational versions of the single-table Naive Bayes Classifier (NBC) [4], [10], [25], [5]. Our NB classification formulas a...

Computing Multi-Relational Sufficient Statistics for Large Databases

by Zhensong Qian, Oliver Schulte, Yan Sun
"... Databases contain information about which relationships do and do not hold among entities. To make this infor-mation accessible for statistical analysis requires computing sucient statistics that combine information from di↵erent database tables. Such statistics may involve any number of positive an ..."
Abstract - Add to MetaCart
Databases contain information about which relationships do and do not hold among entities. To make this infor-mation accessible for statistical analysis requires computing sucient statistics that combine information from di↵erent database tables. Such statistics may involve any number of positive and negative relationships. With a naive enumer-ation approach, computing sucient statistics for negative relationships is feasible only for small databases. We solve this problem with a new dynamic programming algorithm that performs a virtual join, where the requisite counts are computed without materializing join tables. Contingency table algebra is a new extension of relational algebra, that facilitates the ecient implementation of this Möbius vir-tual join operation. The Möbius Join scales to large datasets (over 1M tuples) with complex schemas. Empirical evalu-ation with seven benchmark datasets showed that informa-tion about the presence and absence of links can be exploited in feature selection, association rule mining, and Bayesian network learning.
(Show Context)

Citation Context

...ts parameters instantiated to be the maximum likelihood estimates given the dataset d, and the quantity L(Ĝ,d) is the log-likelihood of Ĝ on d. We use the relational log-likelihood score defined in =-=[10]-=-, which differs from the standard single-table Bayes net likelihood only by replacing counts by frequencies so that scores are comparable across di↵erent nodes and databases. To provide information ab...

Fast Learning of Relational Dependency Networks

by Oliver Schulte, Zhensong Qian, Arthur E. Kirkpatrick, Xiaoqian Yin, Yan Sun
"... Abstract. A Relational Dependency Network (RDN) is a directed graph-ical model widely used for multi-relational data. These networks allow cyclic dependencies, necessary to represent relational autocorrelations. We describe an approach for learning both the RDN’s structure and its parameters, given ..."
Abstract - Add to MetaCart
Abstract. A Relational Dependency Network (RDN) is a directed graph-ical model widely used for multi-relational data. These networks allow cyclic dependencies, necessary to represent relational autocorrelations. We describe an approach for learning both the RDN’s structure and its parameters, given an input relational database: First learn a Bayesian network (BN), then transform the Bayesian network to an RDN. Thus fast Bayes net learning can provide fast RDN learning. The BN-to-RDN transform comprises a simple, local adjustment of the Bayes net structure and a closed-form transform of the Bayes net parameters. This method can learn an RDN for a dataset with a million tuples in minutes. We em-pirically compare our approach to state-of-the art RDN learning methods that use functional gradient boosting, on five benchmark datasets. Learn-ing RDNs via BNs scales much better to large datasets than learning RDNs with boosting, and provides competitive accuracy in predictions. 1
(Show Context)

Citation Context

...was Linux Centos 2.6.32. Code was written in Java, JRE 1.7.0. All code and datasets are available [7]. Datasets We used 5 benchmark real-world databases. For more details please see the references in =-=[17]-=-. Summary statistics appear in Table 2. MovieLens Databases MovieLens is a commonly-used rating dataset2. We added more related attribute information about the actors, directors and movies from the In...

Learning Bayes Nets for Relational Data with Link Uncertainty

by Zhensong Qian, Oliver Schulte
"... Abstract. We present an algorithm for learning correlations among link types and node attributes in relational data that represent complex net-works. The link correlations are represented in a Bayes net structure. This provides a succinct graphical way to display relational statisti-cal patterns and ..."
Abstract - Add to MetaCart
Abstract. We present an algorithm for learning correlations among link types and node attributes in relational data that represent complex net-works. The link correlations are represented in a Bayes net structure. This provides a succinct graphical way to display relational statisti-cal patterns and support powerful probabilistic inferences. The current state of the art algorithm for learning relational Bayes nets captures only correlations among entity attributes given the existence of links among entities. The models described in this paper capture a wider class of cor-relations that involve uncertainty about the link structure. Our base line method learns a Bayes net from join tables directly. This is a statisti-cally powerful procedure that finds many correlations, but does not scale well to larger datasets. We compare join table search with a hierarchical search strategy. 1
(Show Context)

Citation Context

...obabilities of uncertain outcomes conditional on observed events. Previous work on learning Bayes nets for relational data was restricted to correlations among attributes given the existence of links =-=[16]-=-. The larger class of correlations examined in our new algorithms includes two additional kinds: 1. Dependencies between different types of links. 2. Dependencies among node attributes given the absen...

Identifying Important Nodes in Heterogenous Networks

by Oliver Schulte, Fatemeh Riahi, Qing Li
"... This is a position paper that presents a new approach to identifying important nodes or entities in a complex heterogeneous network. We provide a novel definition of an importance score based on a statistical model: An individual is important to the extent that including an individual explicitly in ..."
Abstract - Add to MetaCart
This is a position paper that presents a new approach to identifying important nodes or entities in a complex heterogeneous network. We provide a novel definition of an importance score based on a statistical model: An individual is important to the extent that including an individual explicitly in the model improves the data fit of the model more than it increases the model’s com-plexity. We apply techniques from statistical-relational learning, a recent field that combines AI and machine learning, to identify statistically important individuals in a scalable manner. We investigate empirically our ap-proach with the OPTA soccer data set for the English premier league.
(Show Context)

Citation Context

... . . , σa) where each σi is a constant or variable of the appropriate population. A Parametrized Bayes net is a Bayes net whose nodes are functor nodes. The state-of-the-art learn-and-join algorithm (=-=Schulte and Khosravi 2012-=-) takes as input (1) a relational database D representing a network, (2) a set of functor nodes, and produces a Bayes net for the functor nodes. The learn-and-join algorithm includes a method for extr...

A Proposal for Statistical Outlier Detection in Relational Structures

by Fatemeh Riahi, Oliver Schulte, Qing Li
"... This paper extends unsupervised statistical outlier detection to the case of relational data. For nonrelational data, where each individual is characterized by a feature vector, a common approach starts with learning a generative statistical model for the population. The model assigns a likelihood m ..."
Abstract - Add to MetaCart
This paper extends unsupervised statistical outlier detection to the case of relational data. For nonrelational data, where each individual is characterized by a feature vector, a common approach starts with learning a generative statistical model for the population. The model assigns a likelihood measure for the feature vector that characterizes the individual; the lower the feature vector likelihood, the more anomalous the individual. A difference between relational and nonrelational data is that an individual is characterized not only by a list of attributes, but also by its links and by attributes of the individuals linked to it. We refer to a relational structure that specifies this information for a specific individual as the individual’s database. Our proposal is to use the likelihood assigned by a generative model to the individual’s database as the anomaly score for the individual; the lower the model likelihood, the more anomalous the individual. As a novel validation method, we compare the model likelihood with metrics of individual success. An empirical evaluation reveals a surprising finding in soccer and movie data: We observe in the data a strong correlation between the likelihood and success metrics.
(Show Context)

Citation Context

... 2007; Domingos and Lowd 2009). (1) We use a tractable definition of the likelihood function for a Bayes net given a (sub)database that generalizes the standard definition for the nonrelational case (=-=Schulte and Khosravi 2012-=-; Alsanie and Cussens 2012). (2) We apply the learn-and-join algorithm (LAJ), a state-of-the-art Bayes net structure learning method for relational data (Schulte and Khosravi 2012). Relational Outlier...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University