• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Dissociation and propagation for efficient query evaluation over probabilistic databases constraints (2010)

by W Gatterbauer, A Jha, D Suciu
Venue:In MUD
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

On the Optimal Approximation of Queries Using Tractable Propositional Languages

by Robert Fink, Dan Olteanu
"... This paper investigates the problem of approximating conjunctive queries without self-joins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted langua ..."
Abstract - Cited by 18 (5 self) - Add to MetaCart
This paper investigates the problem of approximating conjunctive queries without self-joins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted language that are greatest lower bound and least upper bound, respectively, ofΦ. We studyboundsin the languages of read-once formulas, where every variable occurs at most once, and of read-once formulas in disjunctive normal form. We show equivalences of syntactic and model-theoretic characterisations of optimal bounds for unate formulas, and present algorithms that can enumerate them with polynomial delay. Such bounds can be computed by queries expressed using first-order queries extended with transitive closure and a special choice construct. Besides probabilistic databases, theseresults can also benefit the problem of approximate query evaluation in relational databases, since the bounds expressed by queries can be computed in polynomial combined complexity. Categories andSubject Descriptors H.2.4 [Database Management]: Systems—Query Processing
(Show Context)

Citation Context

...or intermediate decomposition steps is essential to the effectiveness of such techniques. The closest in spirit to our approach is a technique developed independently by Gatterbauer, Abhay, and Suciu =-=[9]-=-. Their technique computes upper bounds for probabilities of conjunctive queries without self-joins. These bounds are not model-based, and in particular not optimal. Although the query lineage is used...

Probabilistic Databases with MarkoViews

by Abhay Jha, Dan Suciu
"... Most of the work on query evaluation in probabilistic databases has focused on the simple tuple-independent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, tree-decomposit ..."
Abstract - Cited by 12 (5 self) - Add to MetaCart
Most of the work on query evaluation in probabilistic databases has focused on the simple tuple-independent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, tree-decomposition and a variety of approximation algorithms. However, complex data analytics tasks often require complex correlations between tuples, and here query evaluation is significantly more expensive, or more restrictive. In this paper, we propose MVDB as a framework both for representing complex correlations and for efficient query evaluation. An MVDB specifies correlations by views, called MarkoViews, on the probabilistic relations and declaring the weights of the view’s outputs. An MVDB is a (very large) Markov Logic Network. We make two sets of contributions. First, we show that query evaluation on an MVDB is equivalent to evaluating a Union of Conjunctive Query(UCQ) over a tuple-independent database. The translation is exact (thus allowing the techniques developed for tuple independent databases to be carried over to MVDB), yet it is novel and quite non-obvious (some resulting probabilities may be negative!). This translation in itself though may not lead to much gain since the translated query gets complicated as we try to capture more correlations. Our second contribution is to propose a new query evaluation strategy that exploits offline compilation to speed up online query evaluation. Here we utilize and extend our prior work on compilation of UCQ. We validate experimentally our techniques on a large probabilistic database with MarkoViews inferred from the DBLP data. 1.
(Show Context)

Citation Context

...techniques have been described for using OBBDs for query evaluation on probabilistic databases [23, 24, 17]. Finally, these evaluation methods have been extended with several approximation techniques =-=[26, 14]-=-. Markov and Bayesian Networks in Probabilistic Databases. Several proposals exists for extending probabilistic databases to represent Markov Networks or Bayesian Networks. For example, in [18] the pr...

Approximate Lifted Inference with Probabilistic Databases

by Wolfgang Gatterbauer, Dan Suciu
"... This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluat-ing a fixed number of query plans, each providing an upper bound on the true probability, then taking th ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluat-ing a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possi-ble plans. Importantly, this algorithm is a strict generalization of all known results of PTIME self-join-free conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed ex-perimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers. 1.
(Show Context)

Citation Context

...uch rewritings, our techniques can be also applied to MLNs if their rewritings results in conjunctive queries without self-joins. Dissociation. Dissociation was first introduced in the workshop paper =-=[20]-=-, presented as a way to generalize graph propagation algorithms to hypergraphs. Theoretical upper and lower bounds for dissociation of Boolean formulas, including Theorem 8, were proven in [22]. Disso...

Oblivious bounds on the probability of Boolean functions

by Wolfgang Gatterbauer, Dan Suciu - ACM Trans. Database Syst. (TODS
"... This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this ap-proach dissociation and give an exact characterization of optimal oblivious bounds, i.e. w ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this ap-proach dissociation and give an exact characterization of optimal oblivious bounds, i.e. when the new prob-abilities are chosen independent of the probabilities of all other variables. Our motivation comes from the weighted model counting problem (or, equivalently, the problem of computing the probability of a Boolean function), which is #P-hard in general. By performing several dissociations, one can transform a Boolean formula whose probability is difficult to compute, into one whose probability is easy to compute, and which is guaranteed to provide an upper or lower bound on the probability of the original formula by choosing appropriate probabilities for the dissociated variables. Our new bounds shed light on the connection be-tween previous relaxation-based and model-based approximations and unify them as concrete choices in a larger design space. We also show how our theory allows a standard relational database management system (DBMS) to both upper and lower bound hard probabilistic queries in guaranteed polynomial time.

Anytime approximation in probabilistic databases

by Robert Fink, Jiewen Huang, Dan Olteanu , 2013
"... This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algori ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algorithm is used by the SPROUT query engine to approximate the probabilities of results to relational algebra queries on expressive probabilistic databases.

Learning a compositional semantics for Freebase with an open predicate vocabulary

by Jayant Krishnamurthy, Tom M. Mitchell - Transactions of the Association for Computational Linguistics , 2015
"... We present an approach to learning a model-theoretic semantics for natural language tied to Freebase. Crucially, our approach uses an open predicate vocabulary, enabling it to produce denotations for phrases such as “Re-publican front-runner from Texas ” whose se-mantics cannot be represented using ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
We present an approach to learning a model-theoretic semantics for natural language tied to Freebase. Crucially, our approach uses an open predicate vocabulary, enabling it to produce denotations for phrases such as “Re-publican front-runner from Texas ” whose se-mantics cannot be represented using the Free-base schema. Our approach directly converts a sentence’s syntactic CCG parse into a log-ical form containing predicates derived from the words in the sentence, assigning each word a consistent semantics across sentences. This logical form is evaluated against a learned probabilistic database that defines a distribu-tion over denotations for each textual pred-icate. A training phase produces this prob-abilistic database using a corpus of entity-linked text and probabilistic matrix factoriza-tion with a novel ranking objective function. We evaluate our approach on a compositional question answering task where it outperforms several competitive baselines. We also com-pare our approach against manually annotated Freebase queries, finding that our open pred-icate vocabulary enables us to answer many questions that Freebase cannot. 1
(Show Context)

Citation Context

...ndependent. This independence simplifies query evaluation: probabilistic databases permit efficient exact inference for safe queries (Suciu et al., 2011), and approximate inference for the remainder (=-=Gatterbauer et al., 2010-=-; Gatterbauer and Suciu, 2015). 9 Discussion This paper presents an approach for compositional semantics with an open predicate vocabulary. Our approach defines a probabilistic model over denotations ...

Oblivious Bounds on the Probability of Boolean Functions

by Wolfgang Gatterbauer, Dan Suciu , 2013
"... This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. wh ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. when the new prob-abilities are chosen independent of the probabilities of all other variables. Our motivation comes from the weighted model counting problem (or, equivalently, the problem of computing the probability of a Boolean function), which is #P-hard in general. By performing several dissociations, one can transform a Boolean formula whose probability is difficult to compute, into one whose probability is easy to compute, and which is guaranteed to provide an upper or lower bound, respectively, on the probability of the original formula. Our new bounds shed light on the connection between previous relaxation-based and model-based approximations in the literature and unify them as concrete choices in a larger design space. We also show how our theory allows a standard relational database management systems (DBMS) to both upper and lower bound hard probabilistic queries.

Tractability in Probabilistic Databases

by Dan Suciu, See Profile, Dan Suciu , 2011
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract - Add to MetaCart
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.

Managing Structured Collections of Community Data

by Wolfgang Gatterbauer, Dan Suciu
"... Data management is becoming increasingly social. We observe a new form of information in such collaborative scenarios, where users contribute and reuse information, which resides neither in the base data nor in the schema information. This “superimposed structure ” derives partly from interaction wi ..."
Abstract - Add to MetaCart
Data management is becoming increasingly social. We observe a new form of information in such collaborative scenarios, where users contribute and reuse information, which resides neither in the base data nor in the schema information. This “superimposed structure ” derives partly from interaction within the community, and partly from the recombination of existing data. We argue that this triad of data, schema, and higher-order structure requires new data abstractions that – at the same time – must efficiently scale to very large community databases. In addition, data generated by the community exposes four characteristics that make scalability especially difficult: (i) inconsistency, as different users or applications have or require partially overlapping and contradicting views; (ii) non-monotonicity, as new information may be able to revoke previous information already built upon; (iii) uncertainty, as both user intent and rankings are generally uncertain; and (iv) provenance, as content contributors want to track their data, and “content re-users ” evaluate their trust. We show promising scalable solutions to two of these problems, and illustrate the general data management challenges with a seemingly simple example from community e-learning (“ce-learning”).
(Show Context)

Citation Context

... q :− R(x),S(x, y),T(y) which is #P hard in our case (many-to-many relations between all entities in Fig. 2a). The solution that we advocate is our recently introduced technique of query dissociation =-=[8]-=-. The basic idea is to slightly change the semantics of probabilistic query evaluation, and to then calculate the ranking score of result tuples with a few materialized views and a single query plan t...

Approximate Lifted Inference in Probabilistic Databases

by Wolfgang Gatterbauer, Dan Suciu
"... This paper proposes a new approach for approximate evaluation of #P-hard queries over probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluat-ing a fixed number of query plans, each providing an upper bound on the true probability, then taking ..."
Abstract - Add to MetaCart
This paper proposes a new approach for approximate evaluation of #P-hard queries over probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluat-ing a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possi-ble plans. Importantly, this algorithm is a strict generalization of all known results of PTIME self-join free conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed ex-perimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers.
(Show Context)

Citation Context

...uch rewritings, our techniques can be also applied to MLNs if their rewritings results in conjunctive queries without self-joins. Dissociation. Dissociation was first introduced in the workshop paper =-=[20]-=-, presented as a way to generalize graph propagation algorithms to hypergraphs. Theoretical upper and lower bounds for dissociation of Boolean formulas, including Theorem 8, were proven in [22]. Disso...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University