Results 11  20
of
330
Statistical predicate invention
 In Z. Ghahramani (Ed.), Proceedings of the 24’th annual international conference on machine learning (ICML2007
, 2007
"... We propose statistical predicate invention as a key problem for statistical relational learning. SPI is the problem of discovering new concepts, properties and relations in structured data, and generalizes hidden variable discovery in statistical models and predicate invention in ILP. We propose an ..."
Abstract

Cited by 48 (11 self)
 Add to MetaCart
(Show Context)
We propose statistical predicate invention as a key problem for statistical relational learning. SPI is the problem of discovering new concepts, properties and relations in structured data, and generalizes hidden variable discovery in statistical models and predicate invention in ILP. We propose an initial model for SPI based on secondorder Markov logic, in which predicates as well as arguments can be variables, and the domain of discourse is not fully known in advance. Our approach iteratively refines clusters of symbols based on the clusters of symbols they appear in atoms with (e.g., it clusters relations by the clusters of the objects they relate). Since different clusterings are better for predicting different subsets of the atoms, we allow multiple crosscutting clusterings. We show that this approach outperforms Markov logic structure learning and the recently introduced infinite relational model on a number of relational datasets. 1.
Learning Markov logic network structure via hypergraph lifting
 In Proceedings of the 26th International Conference on Machine Learning (ICML09
, 2009
"... Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. Learning MLN structure from a relational database involves learning the clauses and weights. The stateoftheart MLN structure lear ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
(Show Context)
Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. Learning MLN structure from a relational database involves learning the clauses and weights. The stateoftheart MLN structure learners all involve some element of greedily generating candidate clauses, and are susceptible to local optima. To address this problem, we present an approach that directly utilizes the data in constructing candidates. A relational database can be viewed as a hypergraph with constants as nodes and relations as hyperedges. We find paths of true ground atoms in the hypergraph that are connected via their arguments. To make this tractable (there are exponentially many paths in the hypergraph), we lift the hypergraph by jointly clustering the constants to form higherlevel concepts, and find paths in it. We variabilize the ground atoms in each path, and use them to form clauses, which are evaluated using a pseudolikelihood measure. In our experiments on three realworld datasets, we find that our algorithm outperforms the stateoftheart approaches. 1.
Relational Retrieval Using a Combination of PathConstrained Random Walks
"... Abstract. Scientific literature with rich metadata can be represented as a labeled directed graph. This graph representation enables a number of scientific tasks such as ad hoc retrieval or named entity recognition (NER) to be formulated as typed proximity queries in the graph. One popular proximity ..."
Abstract

Cited by 46 (9 self)
 Add to MetaCart
(Show Context)
Abstract. Scientific literature with rich metadata can be represented as a labeled directed graph. This graph representation enables a number of scientific tasks such as ad hoc retrieval or named entity recognition (NER) to be formulated as typed proximity queries in the graph. One popular proximity measure is called Random Walk with Restart (RWR), and much work has been done on the supervised learning of RWR measures by associating each edge label with a parameter. In this paper, we describe a novel learnable proximity measure which instead uses one weight per edge label sequence: proximity is defined by a weighted combination of simple “path experts”, each corresponding to following a particular sequence of labeled edges. Experiments on eight tasks in two subdomains of biology show that the new learning method significantly outperforms the RWR model (both trained and untrained). We also extend the method to support two additional types of experts to model intrinsic properties of entities: queryindependent experts, which generalize the PageRank measure, and popular entity experts which allow rankings to be adjusted for particular entities that are especially important.
flda: matrix factorization through latent dirichlet allocation
 In Proc. of WSDM ’10
, 2010
"... We propose fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a “bagofwords ” representation for item metadata is natural. Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items ar ..."
Abstract

Cited by 43 (0 self)
 Add to MetaCart
(Show Context)
We propose fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a “bagofwords ” representation for item metadata is natural. Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively. Because of data sparseness, regularization is key to good predictive accuracy. Our method works by regularizing both user and item factors simultaneously through user features and the bag of words associated with each item. Specifically, each word in an item is associated with a discrete latent factor often referred to as the topic of the word; item topics are obtained by averaging topics across all words in an item. Then, user rating on an item is modeled as user’s affinity to the item’s topics where user affinity to topics (user factors) and topic assignments to words in items (item factors) are learned jointly in a supervised fashion. To avoid overfitting, user and item factors are regularized through Gaussian linear regression and Latent Dirichlet Allocation (LDA) priors respectively. We show our model is accurate, interpretable and handles both coldstart and warmstart scenarios seamlessly through a single model. The efficacy of our method is illustrated on benchmark datasets and a new dataset from Yahoo! Buzz where fLDA provides superior predictive accuracy in coldstart scenarios and is comparable to stateoftheart methods in warmstart scenarios. As a byproduct, fLDA also identifies interesting topics that explains useritem interactions. Our method also generalizes a recently proposed technique called supervised LDA (sLDA) to collaborative filtering applications. While sLDA estimates item topic vectors in a supervised fashion for a single regression, fLDA incorporates multiple regressions (one for each user) in estimating the item factors.
Hybrid Markov Logic Networks
"... Markov logic networks (MLNs) combine firstorder logic and Markov networks, allowing us to handle the complexity and uncertainty of realworld problems in a single consistent framework. However, in MLNs all variables and features are discrete, while most realworld applications also contain continuo ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
(Show Context)
Markov logic networks (MLNs) combine firstorder logic and Markov networks, allowing us to handle the complexity and uncertainty of realworld problems in a single consistent framework. However, in MLNs all variables and features are discrete, while most realworld applications also contain continuous ones. In this paper we introduce hybrid MLNs, in which continuous properties (e.g., the distance between two objects) and functions over them can appear as features. Hybrid MLNs have all distributions in the exponential family as special cases (e.g., multivariate Gaussians), and allow much more compact modeling of noni.i.d. data than propositional representations like hybrid Bayesian networks. We also introduce inference algorithms for hybrid MLNs, by extending the MaxWalkSAT and MCSAT algorithms to continuous domains. Experiments in a mobile robot mapping domain—involving joint classification, clustering and regression—illustrate the power of hybrid MLNs as a modeling language, and the accuracy and efficiency of the inference algorithms.
Exploiting Shared Correlations in Probabilistic Databases
, 2008
"... There has been a recent surge in work in probabilistic databases, propelled in large part by the huge increase in noisy data sources — from sensor data, experimental data, data from uncurated sources, and many others. There is a growing need for database management systems that can efficiently repre ..."
Abstract

Cited by 38 (7 self)
 Add to MetaCart
There has been a recent surge in work in probabilistic databases, propelled in large part by the huge increase in noisy data sources — from sensor data, experimental data, data from uncurated sources, and many others. There is a growing need for database management systems that can efficiently represent and query such data. In this work, we show how data characteristics can be leveraged to make the query evaluation process more efficient. In particular, we exploit what we refer to as shared correlations where the same uncertainties and correlations occur repeatedly in the data. Shared correlations occur mainly due to two reasons: (1) Uncertainty and correlations usually come from general statistics and rarely vary on a tupletotuple basis; (2) The query evaluation procedure itself tends to reintroduce the same correlations. Prior work has shown that the query evaluation problem on probabilistic databases is equivalent to a probabilistic inference problem on an appropriately constructed probabilistic graphical model (PGM). We leverage this by introducing a new data structure, called the random variable elimination graph (rvelim graph) that can be built from the PGM obtained from query evaluation. We develop techniques based on bisimulation that can be used to compress the rvelim graph exploiting the presence of shared correlations in the PGM, the compressed rvelim graph can then be used to run inference. We validate our methods by evaluating them empirically and show that even with a few shared correlations significant speedups are possible.
Learning firstorder probabilistic models with combining rules
 IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE IN MACHINE LEARNING
, 2005
"... Many realworld domains exhibit rich relational structure and stochasticity and motivate the development of models that combine predicate logic with probabilities. These models describe probabilistic influences between attributes of objects that are related to each other through known domain relatio ..."
Abstract

Cited by 38 (15 self)
 Add to MetaCart
(Show Context)
Many realworld domains exhibit rich relational structure and stochasticity and motivate the development of models that combine predicate logic with probabilities. These models describe probabilistic influences between attributes of objects that are related to each other through known domain relationships. To keep these models succinct, each such influence is considered independent of others, which is called the assumption of “independence of causal influences” (ICI). In this paper, we describe a language that consists of quantified conditional influence statements and captures most relational probabilistic models based on directed graphs. The influences due to different statements are combined using a set of combining rules such as NoisyOR. We motivate and introduce multilevel combining rules, where the lower level rules combine the influences due to different ground instances of the same statement, and the upper level rules combine the influences due to different statements. We present algorithms and empirical results for parameter learning in the presence of such combining rules. Specifically, we derive and implement algorithms based on gradient descent and expectation maximization for different combining rules and evaluate them on synthetic data and on a realworld task. The results demonstrate that the algorithms are able to learn both the conditional probability distributions of the influence statements and the parameters of the combining rules.
Coevolution of social and affiliation networks
 In 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD
, 2009
"... In our work, we address the problem of modeling social network generation which explains both link and group formation. Recent studies on social network evolution propose generative models which capture the statistical properties of realworld networks related only to nodetonode link formation. We ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
(Show Context)
In our work, we address the problem of modeling social network generation which explains both link and group formation. Recent studies on social network evolution propose generative models which capture the statistical properties of realworld networks related only to nodetonode link formation. We propose a novel model which captures the coevolution of social and affiliation networks. We provide surprising insights into group formation based on observations in several realworld networks, showing that users often join groups for reasons other than their friends. Our experiments show that the model is able to capture both the newly observed and previously studied network properties. This work is the first to propose a generative model which captures the statistical properties of these complex networks. The proposed model facilitates controlled experiments which study the effect of actors ’ behavior on the network evolution, and it allows the generation of realistic synthetic datasets.
Towards a universal wordnet by learning from combined evidence
 In Proc. CIKM 2009
, 2009
"... Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a largescale multilingual lexical database where words of many languages are hierarchically orga ..."
Abstract

Cited by 37 (15 self)
 Add to MetaCart
(Show Context)
Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a largescale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a wellknown Englishlanguage resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graphbased scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as crosslingual text classification.
Gradientbased boosting for Statistical Relational Learning: The Relational Dependency Network Case
, 2011
"... Abstract. Dependency networks approximate a joint probability distribution over multiple random variables as a product of conditional distributions. Relational Dependency Networks (RDNs) are graphical models that extend dependency networks to relational domains. This higher expressivity, however, co ..."
Abstract

Cited by 37 (17 self)
 Add to MetaCart
Abstract. Dependency networks approximate a joint probability distribution over multiple random variables as a product of conditional distributions. Relational Dependency Networks (RDNs) are graphical models that extend dependency networks to relational domains. This higher expressivity, however, comes at the expense of a more complex modelselection problem: an unbounded number of relational abstraction levels might need to be explored. Whereas current learning approaches for RDNs learn a single probability tree per random variable, we propose to turn the problem into a series of relational functionapproximation problems using gradientbased boosting. In doing so, one can easily induce highly complex features over several iterations and in turn estimate quickly a very expressive model. Our experimental results in several different data sets show that this boosting method results in efficient learning of RDNs when compared to stateoftheart statistical relational learning approaches. 1