Results

**1 - 6**of**6**### Computing Multi-Relational Sufficient Statistics for Large Databases

"... Databases contain information about which relationships do and do not hold among entities. To make this infor-mation accessible for statistical analysis requires computing sucient statistics that combine information from di↵erent database tables. Such statistics may involve any number of positive an ..."

Abstract
- Add to MetaCart

(Show Context)
Databases contain information about which relationships do and do not hold among entities. To make this infor-mation accessible for statistical analysis requires computing sucient statistics that combine information from di↵erent database tables. Such statistics may involve any number of positive and negative relationships. With a naive enumer-ation approach, computing sucient statistics for negative relationships is feasible only for small databases. We solve this problem with a new dynamic programming algorithm that performs a virtual join, where the requisite counts are computed without materializing join tables. Contingency table algebra is a new extension of relational algebra, that facilitates the ecient implementation of this Möbius vir-tual join operation. The Möbius Join scales to large datasets (over 1M tuples) with complex schemas. Empirical evalu-ation with seven benchmark datasets showed that informa-tion about the presence and absence of links can be exploited in feature selection, association rule mining, and Bayesian network learning.

### Fast Learning of Relational Dependency Networks

"... Abstract. A Relational Dependency Network (RDN) is a directed graph-ical model widely used for multi-relational data. These networks allow cyclic dependencies, necessary to represent relational autocorrelations. We describe an approach for learning both the RDN’s structure and its parameters, given ..."

Abstract
- Add to MetaCart

Abstract. A Relational Dependency Network (RDN) is a directed graph-ical model widely used for multi-relational data. These networks allow cyclic dependencies, necessary to represent relational autocorrelations. We describe an approach for learning both the RDN’s structure and its parameters, given an input relational database: First learn a Bayesian network (BN), then transform the Bayesian network to an RDN. Thus fast Bayes net learning can provide fast RDN learning. The BN-to-RDN transform comprises a simple, local adjustment of the Bayes net structure and a closed-form transform of the Bayes net parameters. This method can learn an RDN for a dataset with a million tuples in minutes. We em-pirically compare our approach to state-of-the art RDN learning methods that use functional gradient boosting, on five benchmark datasets. Learn-ing RDNs via BNs scales much better to large datasets than learning RDNs with boosting, and provides competitive accuracy in predictions. 1

### Learning Bayes Nets for Relational Data with Link Uncertainty

"... Abstract. We present an algorithm for learning correlations among link types and node attributes in relational data that represent complex net-works. The link correlations are represented in a Bayes net structure. This provides a succinct graphical way to display relational statisti-cal patterns and ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract. We present an algorithm for learning correlations among link types and node attributes in relational data that represent complex net-works. The link correlations are represented in a Bayes net structure. This provides a succinct graphical way to display relational statisti-cal patterns and support powerful probabilistic inferences. The current state of the art algorithm for learning relational Bayes nets captures only correlations among entity attributes given the existence of links among entities. The models described in this paper capture a wider class of cor-relations that involve uncertainty about the link structure. Our base line method learns a Bayes net from join tables directly. This is a statisti-cally powerful procedure that finds many correlations, but does not scale well to larger datasets. We compare join table search with a hierarchical search strategy. 1

### Identifying Important Nodes in Heterogenous Networks

"... This is a position paper that presents a new approach to identifying important nodes or entities in a complex heterogeneous network. We provide a novel definition of an importance score based on a statistical model: An individual is important to the extent that including an individual explicitly in ..."

Abstract
- Add to MetaCart

(Show Context)
This is a position paper that presents a new approach to identifying important nodes or entities in a complex heterogeneous network. We provide a novel definition of an importance score based on a statistical model: An individual is important to the extent that including an individual explicitly in the model improves the data fit of the model more than it increases the model’s com-plexity. We apply techniques from statistical-relational learning, a recent field that combines AI and machine learning, to identify statistically important individuals in a scalable manner. We investigate empirically our ap-proach with the OPTA soccer data set for the English premier league.

### Comparison between Explicit Learning and Implicit Modeling of Relational Features in Structured Output Spaces

"... Abstract. Building relational models for the structured output classi-fication problem of sequence labeling has been recently explored in a few research works. The models built in such a manner are interpretable and capture much more information about the domain (than models built directly from basi ..."

Abstract
- Add to MetaCart

Abstract. Building relational models for the structured output classi-fication problem of sequence labeling has been recently explored in a few research works. The models built in such a manner are interpretable and capture much more information about the domain (than models built directly from basic attributes), resulting in accurate predictions. On the other hand, discovering optimal relational features is a hard task, since the space of relational features is exponentially large. An exhaustive search in this exponentially large feature space is infeasible. Therefore, often the feature space is explored using heuristics. Recently, we proposed a Hierarchical Kernels-based feature learning approach (StructHKL) for sequence labeling [?], that optimally learns emission features in the form of conjunctions of basic inputs at a sequence position. However, Struc-tHKL cannot be trivially applied to learn complex relational features derived from relative sequence positions. In this paper, we seek to learn optimal relational sequence labeling models by leveraging a relational kernel that computes the similarity between instances in an implicit space of relational features. To this end, we employ relational subse-quence kernels at each sequence position (over a time window of obser-vations around the pivot position) for the classification model. While this method of modeling does not result in interpretability, relational subse-quence kernels do efficiently capture relational sequential information on the inputs. We present experimental comparison between approaches for explicit learning and implicit modeling of relational features and explain the trade-offs therein.