Results 1  10
of
12
Programming with Personalized PageRank: A Locally Groundable FirstOrder Probabilistic Logic
"... Many informationmanagement tasks (including classification, retrieval, information extraction, and information integration) can be formalized as inference in an appropriate probabilistic firstorder logic. However, most probabilistic firstorder logics are not efficient enough for realisticallysize ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
(Show Context)
Many informationmanagement tasks (including classification, retrieval, information extraction, and information integration) can be formalized as inference in an appropriate probabilistic firstorder logic. However, most probabilistic firstorder logics are not efficient enough for realisticallysized instances of these tasks. One key problem is that queries are typically answered by “grounding ” the query— i.e., mapping it to a propositional representation, and then performing propositional inference—and with a large database of facts, groundings can be very large, making inference and learning computationally expensive. Here we present a firstorder probabilistic language which is wellsuited to approximate “local ” grounding: in particular, every query Q can be approximately grounded with a small graph. The language is an extension of stochastic logic programs where inference is performed by a variant of personalized PageRank. Experimentally, we show that the approach performs well on an entity resolution task, a classification task, and a joint inference task; that the cost of inference is independent of database size; and that speedup in learning is possible by multithreading.
Topk Query Processing in Probabilistic Databases with NonMaterialized Views
, 2012
"... In this paper, we investigate a novel approach of computing confidence bounds for topk ranking queries in probabilistic databases with nonmaterialized views. Unlike prior approaches, we present an exact pruning algorithm for finding the topranked query answers according to their marginal probabil ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
In this paper, we investigate a novel approach of computing confidence bounds for topk ranking queries in probabilistic databases with nonmaterialized views. Unlike prior approaches, we present an exact pruning algorithm for finding the topranked query answers according to their marginal probabilities without the need to first materialize all answer candidates via the views. Specifically, we consider conjunctive queries over multiple levels of selectprojectjoin views, the latter of which are cast into Datalog rules, where also the rules themselves may be uncertain, i.e., be valid with some degree of confidence. To our knowledge, this work is the first to address integrated data and confidence computations in the context of probabilistic databases by considering confidence bounds over partially evaluated query answers with firstorder lineage formulas. We further extend our query processing techniques by a toolsuite of scheduling strategies based on selectivity estimation and the expected impact of subgoals on the final confidence of answer candidates. Experiments with large datasets demonstrate drastic runtime improvements over both sampling and decompositionbased methods—even
Symmetric Weighted FirstOrder Model Counting.
 In Proc. of PODS’15,
, 2015
"... Abstract Firstorder model counting emerged recently as a novel reasoning task, at the core of efficient algorithms for probabilistic logics. We present a Skolemization algorithm for model counting problems that eliminates existential quantifiers from a firstorder logic theory without changing its ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Abstract Firstorder model counting emerged recently as a novel reasoning task, at the core of efficient algorithms for probabilistic logics. We present a Skolemization algorithm for model counting problems that eliminates existential quantifiers from a firstorder logic theory without changing its weighted model count. For certain subsets of firstorder logic, lifted model counters were shown to run in time polynomial in the number of objects in the domain of discourse, where propositional model counters require exponential time. However, these guarantees apply only to Skolem normal form theories (i.e., no existential quantifiers) as the presence of existential quantifiers reduces lifted model counters to propositional ones. Since textbook Skolemization is not sound for model counting, these restrictions precluded efficient model counting for directed models, such as probabilistic logic programs, which rely on existential quantification. Our Skolemization procedure extends the applicability of firstorder model counters to these representations. Moreover, it simplifies the design of lifted model counting algorithms.
Understanding the Complexity of Lifted Inference and Asymmetric Weighted Model Counting
"... In this paper we study lifted inference for the Weighted FirstOrder Model Counting problem (WFOMC), which counts the assignments that satisfy a given sentence in firstorder logic (FOL); it has applications in Statistical Relational Learning (SRL) and Probabilistic Databases (PDB). We present se ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
In this paper we study lifted inference for the Weighted FirstOrder Model Counting problem (WFOMC), which counts the assignments that satisfy a given sentence in firstorder logic (FOL); it has applications in Statistical Relational Learning (SRL) and Probabilistic Databases (PDB). We present several results. First, we describe a lifted inference algorithm that generalizes prior approaches in SRL and PDB. Second, we provide a novel dichotomy result for a nontrivial fragment of FO CNF sentences, showing that for each sentence the WFOMC problem is either in PTIME or #Phard in the size of the input domain; we prove that, in the first case our algorithm solves the WFOMC problem in PTIME, and in the second case it fails. Third, we present several properties of the algorithm. Finally, we discuss limitations of lifted inference for symmetric probabilistic databases (where the weights of ground literals depend only on the relation name, and not on the constants of the domain), and prove the impossibility of a dichotomy result for the complexity of probabilistic inference for the entire language FOL. 1
Approximate Lifted Inference with Probabilistic Databases
"... This paper proposes a new approach for approximate evaluation of #Phard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking th ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
This paper proposes a new approach for approximate evaluation of #Phard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possible plans. Importantly, this algorithm is a strict generalization of all known results of PTIME selfjoinfree conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed experimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over nonprobabilistic methods for ranking query answers. 1.
SlimShot: InDatabase Probabilistic Inference for Knowledge BasesE
"... ABSTRACT Increasingly large Knowledge Bases are being created, by crawling the Web or other corpora of documents, and by extracting facts and relations using machine learning techniques. To manage the uncertainty in the data, these KBs rely on probabilistic engines based on Markov Logic Networks (M ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT Increasingly large Knowledge Bases are being created, by crawling the Web or other corpora of documents, and by extracting facts and relations using machine learning techniques. To manage the uncertainty in the data, these KBs rely on probabilistic engines based on Markov Logic Networks (MLN), for which probabilistic inference remains a major challenge. Today's state of the art systems use variants of MCMC, which have no theoretical error guarantees, and, as we show, suffer from poor performance in practice. In this paper we describe SlimShot (Scalable Lifted Inference and Monte Carlo Sampling Hybrid Optimization Technique), a probabilistic inference engine for knowledge bases. SlimShot converts the MLN to a tupleindependent probabilistic database, then uses a simple Monte Carlobased inference, with three key enhancements: (1) it combines sampling with safe query evaluation, (2) it estimates a conditional probability by jointly computing the numerator and denominator, and (3) it adjusts the proposal distribution based on the sample cardinality. In combination, these three techniques allow us to give formal error guarantees, and we demonstrate empirically that SlimShot outperforms today's state of the art probabilistic inference engines used in knowledge bases.
Database Principles in Information Extraction
, 2014
"... Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. This tutorial gives an overview of the algorithmic concepts an ..."
Abstract
 Add to MetaCart
(Show Context)
Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. This tutorial gives an overview of the algorithmic concepts and techniques used for performing Information Extraction tasks, and describes some of the declarative frameworks that provide abstractions and infrastructure for programming extractors. In addition, the tutorial highlights opportunities for research impact through principles of data management, illustrates these opportunities through recent work, and proposes directions for future research.
Mach Learn
, 2014
"... Efficient inference and learning in a large knowledge base Reasoning with extracted information using a locally groundable firstorder probabilistic logic ..."
Abstract
 Add to MetaCart
(Show Context)
Efficient inference and learning in a large knowledge base Reasoning with extracted information using a locally groundable firstorder probabilistic logic
Approximate Lifted Inference in Probabilistic Databases
"... This paper proposes a new approach for approximate evaluation of #Phard queries over probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking ..."
Abstract
 Add to MetaCart
(Show Context)
This paper proposes a new approach for approximate evaluation of #Phard queries over probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possible plans. Importantly, this algorithm is a strict generalization of all known results of PTIME selfjoin free conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed experimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over nonprobabilistic methods for ranking query answers.