Results 1 - 10
of
11
On the Optimal Approximation of Queries Using Tractable Propositional Languages
"... This paper investigates the problem of approximating conjunctive queries without self-joins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted langua ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
(Show Context)
This paper investigates the problem of approximating conjunctive queries without self-joins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted language that are greatest lower bound and least upper bound, respectively, ofΦ. We studyboundsin the languages of read-once formulas, where every variable occurs at most once, and of read-once formulas in disjunctive normal form. We show equivalences of syntactic and model-theoretic characterisations of optimal bounds for unate formulas, and present algorithms that can enumerate them with polynomial delay. Such bounds can be computed by queries expressed using first-order queries extended with transitive closure and a special choice construct. Besides probabilistic databases, theseresults can also benefit the problem of approximate query evaluation in relational databases, since the bounds expressed by queries can be computed in polynomial combined complexity. Categories andSubject Descriptors H.2.4 [Database Management]: Systems—Query Processing
Probabilistic Databases with MarkoViews
"... Most of the work on query evaluation in probabilistic databases has focused on the simple tuple-independent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, tree-decomposit ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
(Show Context)
Most of the work on query evaluation in probabilistic databases has focused on the simple tuple-independent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, tree-decomposition and a variety of approximation algorithms. However, complex data analytics tasks often require complex correlations between tuples, and here query evaluation is significantly more expensive, or more restrictive. In this paper, we propose MVDB as a framework both for representing complex correlations and for efficient query evaluation. An MVDB specifies correlations by views, called MarkoViews, on the probabilistic relations and declaring the weights of the view’s outputs. An MVDB is a (very large) Markov Logic Network. We make two sets of contributions. First, we show that query evaluation on an MVDB is equivalent to evaluating a Union of Conjunctive Query(UCQ) over a tuple-independent database. The translation is exact (thus allowing the techniques developed for tuple independent databases to be carried over to MVDB), yet it is novel and quite non-obvious (some resulting probabilities may be negative!). This translation in itself though may not lead to much gain since the translated query gets complicated as we try to capture more correlations. Our second contribution is to propose a new query evaluation strategy that exploits offline compilation to speed up online query evaluation. Here we utilize and extend our prior work on compilation of UCQ. We validate experimentally our techniques on a large probabilistic database with MarkoViews inferred from the DBLP data. 1.
Approximate Lifted Inference with Probabilistic Databases
"... This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluat-ing a fixed number of query plans, each providing an upper bound on the true probability, then taking th ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluat-ing a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possi-ble plans. Importantly, this algorithm is a strict generalization of all known results of PTIME self-join-free conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed ex-perimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers. 1.
Oblivious bounds on the probability of Boolean functions
- ACM Trans. Database Syst. (TODS
"... This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this ap-proach dissociation and give an exact characterization of optimal oblivious bounds, i.e. w ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this ap-proach dissociation and give an exact characterization of optimal oblivious bounds, i.e. when the new prob-abilities are chosen independent of the probabilities of all other variables. Our motivation comes from the weighted model counting problem (or, equivalently, the problem of computing the probability of a Boolean function), which is #P-hard in general. By performing several dissociations, one can transform a Boolean formula whose probability is difficult to compute, into one whose probability is easy to compute, and which is guaranteed to provide an upper or lower bound on the probability of the original formula by choosing appropriate probabilities for the dissociated variables. Our new bounds shed light on the connection be-tween previous relaxation-based and model-based approximations and unify them as concrete choices in a larger design space. We also show how our theory allows a standard relational database management system (DBMS) to both upper and lower bound hard probabilistic queries in guaranteed polynomial time.
Anytime approximation in probabilistic databases
, 2013
"... This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algori ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algorithm is used by the SPROUT query engine to approximate the probabilities of results to relational algebra queries on expressive probabilistic databases.
Learning a compositional semantics for Freebase with an open predicate vocabulary
- Transactions of the Association for Computational Linguistics
, 2015
"... We present an approach to learning a model-theoretic semantics for natural language tied to Freebase. Crucially, our approach uses an open predicate vocabulary, enabling it to produce denotations for phrases such as “Re-publican front-runner from Texas ” whose se-mantics cannot be represented using ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We present an approach to learning a model-theoretic semantics for natural language tied to Freebase. Crucially, our approach uses an open predicate vocabulary, enabling it to produce denotations for phrases such as “Re-publican front-runner from Texas ” whose se-mantics cannot be represented using the Free-base schema. Our approach directly converts a sentence’s syntactic CCG parse into a log-ical form containing predicates derived from the words in the sentence, assigning each word a consistent semantics across sentences. This logical form is evaluated against a learned probabilistic database that defines a distribu-tion over denotations for each textual pred-icate. A training phase produces this prob-abilistic database using a corpus of entity-linked text and probabilistic matrix factoriza-tion with a novel ranking objective function. We evaluate our approach on a compositional question answering task where it outperforms several competitive baselines. We also com-pare our approach against manually annotated Freebase queries, finding that our open pred-icate vocabulary enables us to answer many questions that Freebase cannot. 1
Oblivious Bounds on the Probability of Boolean Functions
, 2013
"... This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. wh ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. when the new prob-abilities are chosen independent of the probabilities of all other variables. Our motivation comes from the weighted model counting problem (or, equivalently, the problem of computing the probability of a Boolean function), which is #P-hard in general. By performing several dissociations, one can transform a Boolean formula whose probability is difficult to compute, into one whose probability is easy to compute, and which is guaranteed to provide an upper or lower bound, respectively, on the probability of the original formula. Our new bounds shed light on the connection between previous relaxation-based and model-based approximations in the literature and unify them as concrete choices in a larger design space. We also show how our theory allows a standard relational database management systems (DBMS) to both upper and lower bound hard probabilistic queries.
Tractability in Probabilistic Databases
, 2011
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
- Add to MetaCart
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Managing Structured Collections of Community Data
"... Data management is becoming increasingly social. We observe a new form of information in such collaborative scenarios, where users contribute and reuse information, which resides neither in the base data nor in the schema information. This “superimposed structure ” derives partly from interaction wi ..."
Abstract
- Add to MetaCart
(Show Context)
Data management is becoming increasingly social. We observe a new form of information in such collaborative scenarios, where users contribute and reuse information, which resides neither in the base data nor in the schema information. This “superimposed structure ” derives partly from interaction within the community, and partly from the recombination of existing data. We argue that this triad of data, schema, and higher-order structure requires new data abstractions that – at the same time – must efficiently scale to very large community databases. In addition, data generated by the community exposes four characteristics that make scalability especially difficult: (i) inconsistency, as different users or applications have or require partially overlapping and contradicting views; (ii) non-monotonicity, as new information may be able to revoke previous information already built upon; (iii) uncertainty, as both user intent and rankings are generally uncertain; and (iv) provenance, as content contributors want to track their data, and “content re-users ” evaluate their trust. We show promising scalable solutions to two of these problems, and illustrate the general data management challenges with a seemingly simple example from community e-learning (“ce-learning”).
Approximate Lifted Inference in Probabilistic Databases
"... This paper proposes a new approach for approximate evaluation of #P-hard queries over probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluat-ing a fixed number of query plans, each providing an upper bound on the true probability, then taking ..."
Abstract
- Add to MetaCart
(Show Context)
This paper proposes a new approach for approximate evaluation of #P-hard queries over probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluat-ing a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possi-ble plans. Importantly, this algorithm is a strict generalization of all known results of PTIME self-join free conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed ex-perimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers.