Results 1  10
of
41
SPROUT: Lazy vs. eager query plans for tupleindependent probabilistic databases
 In Proc. of ICDE 2009
, 2009
"... Abstract—A paramount challenge in probabilistic databases is the scalable computation of confidences of tuples in query results. This paper introduces an efficient secondarystorage operator for exact computation of queries on tupleindependent probabilistic databases. We consider the conjunctive qu ..."
Abstract

Cited by 55 (12 self)
 Add to MetaCart
(Show Context)
Abstract—A paramount challenge in probabilistic databases is the scalable computation of confidences of tuples in query results. This paper introduces an efficient secondarystorage operator for exact computation of queries on tupleindependent probabilistic databases. We consider the conjunctive queries without selfjoins that are known to be tractable on any tupleindependent database, and queries that are not tractable in generalbutbecometractableonprobabilisticdatabasesrestricted by functional dependencies. Our operator is semantically equivalent to a sequence of aggregations and can be naturally integrated into existing relational query plans. As a proof of concept, we developed an extension of the PostgreSQL 8.3.3 query engine called SPROUT. We study optimizations that push or pull our operator or parts thereof past joins. The operator employs static information, such
Approximate Confidence Computation in Probabilistic Databases
"... Abstract—This paper introduces a deterministic approximation algorithm with error guarantees for computing the probability of propositional formulas over discrete random variables. The algorithmisbasedonanincrementalcompilationofformulasinto decision diagrams using three types of decompositions: Sha ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
(Show Context)
Abstract—This paper introduces a deterministic approximation algorithm with error guarantees for computing the probability of propositional formulas over discrete random variables. The algorithmisbasedonanincrementalcompilationofformulasinto decision diagrams using three types of decompositions: Shannon expansion, independence partitioning, and product factorization. With each decomposition step, lower and upper bounds on the probability of the partially compiled formula can be quickly computed and checked against the allowed error. This algorithm can be effectively used to compute approximate confidence values of answer tuples to positive relational algebra queries on general probabilistic databases (ctables with discrete probability distributions). We further tune our algorithm so as to capture all known tractable conjunctive queries without selfjoins on tupleindependent probabilistic databases: In this case, the algorithm requires time polynomial in the input size even for exact computation. We implementedthealgorithm as anextension of theSPROUT query engine. An extensive experimental effort shows that it consistently outperforms stateofart approximation techniques by several orders of magnitude. I.
ReadOnce Functions and Query Evaluation in Probabilistic Databases
"... Probabilistic databases hold promise of being a viable means for largescale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluat ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Probabilistic databases hold promise of being a viable means for largescale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluation in probabilistic databases has largely concentrated on querycentric formulations (e.g., safe plans, hierarchical queries), in that, they only consider characteristics of the query and not the data in the database. It is easy to construct examples where a supposedly hard query run on an appropriate database gives rise to a tractable query evaluation problem. In this paper, we develop efficient query evaluation techniques that leverage characteristics of both the query and the data in the database. We focus on tupleindependent databases where the query evaluation problem is equivalent to computing marginal probabilities of Boolean formulas associated with the result tuples. Query evaluation is easy if the Boolean formulas can be factorized into a form that has every variable appearing at most once (called readonce); this suggests a naive approach that incorporates previously developed Boolean formula factorization algorithms into the query evaluation. We then develop novel, more efficient factorization algorithms that work for a large subclass of queries (specifically, conjunctive queries without selfjoins), by exploiting the unique structure of the result tuple Boolean formulas. We empirically demonstrate that our proposed techniques are (1) orders of magnitude faster than generic inference algorithms when used to evaluate general readonce functions, and (2) for the special case of hierarchical queries, they rival the efficiency of prior techniques specifically designed to handle such queries. 1.
SecondaryStorage Confidence Computation for Conjunctive Queries with Inequalities
 In Proc. SIGMOD
, 2009
"... This paper investigates the problem of efficiently computing the confidences of distinct tuples in the answers to conjunctive queries with inequalities (<) on tupleindependent probabilistic databases. This problem is fundamental to probabilistic databases and was recently stated open. Our contri ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
(Show Context)
This paper investigates the problem of efficiently computing the confidences of distinct tuples in the answers to conjunctive queries with inequalities (<) on tupleindependent probabilistic databases. This problem is fundamental to probabilistic databases and was recently stated open. Our contributions are of both theoretical and practical importance. We define a class of tractable queries with inequalities, and generalize existing results on #Phardness of query evaluation, now in the presence of inequalities. For the tractable queries, we introduce a confidence computation technique based on efficient compilation of the lineage of the query answer into Ordered Binary Decision Diagrams (OBDDs), whose sizes are linear in the number of variables of the lineage. We implemented a secondarystorage variant of our technique in PostgreSQL. This variant does not need to materialize the OBDD, but computes, in one scan over the lineage, the probabilities of OBDD fragments and combines them on the fly. Experiments with probabilistic TPCH data show up to two orders of magnitude improvements when compared with stateoftheart approaches.
On the Optimal Approximation of Queries Using Tractable Propositional Languages
"... This paper investigates the problem of approximating conjunctive queries without selfjoins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted langua ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
This paper investigates the problem of approximating conjunctive queries without selfjoins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted language that are greatest lower bound and least upper bound, respectively, ofΦ. We studyboundsin the languages of readonce formulas, where every variable occurs at most once, and of readonce formulas in disjunctive normal form. We show equivalences of syntactic and modeltheoretic characterisations of optimal bounds for unate formulas, and present algorithms that can enumerate them with polynomial delay. Such bounds can be computed by queries expressed using firstorder queries extended with transitive closure and a special choice construct. Besides probabilistic databases, theseresults can also benefit the problem of approximate query evaluation in relational databases, since the bounds expressed by queries can be computed in polynomial combined complexity. Categories andSubject Descriptors H.2.4 [Database Management]: Systems—Query Processing
Knowledge compilation meets database theory : Compiling queries to decision diagrams. (under review
, 2010
"... The goal of Knowledge Compilation is to represent a Boolean expression in a format in which it can answer a range of onlinequeries in PTIME. The onlinequery of main interest to us is model counting, because of its application to query evaluation on probabilistic databases, but other onlinequeries ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
The goal of Knowledge Compilation is to represent a Boolean expression in a format in which it can answer a range of onlinequeries in PTIME. The onlinequery of main interest to us is model counting, because of its application to query evaluation on probabilistic databases, but other onlinequeries can be supported as well such as testing for equivalence, testing for implication, etc. In this paper we study the following problem. Given a database query q, decide whether its lineage can be compiled efficiently into a given target language. We consider four target languages, of strictly increasing expressive power(when the size of compilation is constrained to be polynomial in the input size): ReadOnce Boolean formulae, OBDD, FBDD and dDNNF. For each target, we study the class of database queries that admit polynomial size representation: these queries can also be evaluated in PTIME over probabilistic databases. When queries are restricted to conjunctive queries without selfjoins, it was known that these four classes collapse to the class of hierarchical queries, which is also the class of PTIME queries over probabilistic databases. Our main result in this paper is that, in the case of Unions of Conjunctive Queries (UCQ), these classes form a strict hierarchy. Thus, unlike conjunctive queries without selfjoins, the expressive power of UCQ differs considerably w.r.t. these target compilation languages. Moreover, we give a complete characterization of the first two target languages, based on the query’s syntax.
Bridging the gap between intensional and extensional query evaluation in probabilistic databases. EDBT
, 2010
"... ..."
(Show Context)
Probabilistic Databases with MarkoViews
"... Most of the work on query evaluation in probabilistic databases has focused on the simple tupleindependent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, treedecomposit ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
Most of the work on query evaluation in probabilistic databases has focused on the simple tupleindependent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, treedecomposition and a variety of approximation algorithms. However, complex data analytics tasks often require complex correlations between tuples, and here query evaluation is significantly more expensive, or more restrictive. In this paper, we propose MVDB as a framework both for representing complex correlations and for efficient query evaluation. An MVDB specifies correlations by views, called MarkoViews, on the probabilistic relations and declaring the weights of the view’s outputs. An MVDB is a (very large) Markov Logic Network. We make two sets of contributions. First, we show that query evaluation on an MVDB is equivalent to evaluating a Union of Conjunctive Query(UCQ) over a tupleindependent database. The translation is exact (thus allowing the techniques developed for tuple independent databases to be carried over to MVDB), yet it is novel and quite nonobvious (some resulting probabilities may be negative!). This translation in itself though may not lead to much gain since the translated query gets complicated as we try to capture more correlations. Our second contribution is to propose a new query evaluation strategy that exploits offline compilation to speed up online query evaluation. Here we utilize and extend our prior work on compilation of UCQ. We validate experimentally our techniques on a large probabilistic database with MarkoViews inferred from the DBLP data. 1.
Dissociation and Propagation for Efficient Query Evaluation over Probabilistic Databases
, 2010
"... Queries over probabilistic databases are either safe, in which case they can be evaluated entirely in a relational database engine, or unsafe, in which case they need to be evaluated with a generalpurpose inference engine at a high cost. This paper proposes a new approach by which every query is e ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
(Show Context)
Queries over probabilistic databases are either safe, in which case they can be evaluated entirely in a relational database engine, or unsafe, in which case they need to be evaluated with a generalpurpose inference engine at a high cost. This paper proposes a new approach by which every query is evaluated like a safe query inside the database engine, by using a new method called dissociation. A dissociated query is obtained by adding extraneous variables to some atoms until the query becomes safe. We show that the probability of the original query and that of the dissociated query correspond to two wellknown scoring functions on graphs, namely graph reliability (which is #Phard), and the propagation score (which is related to PageRank and is in PTIME): When restricted to graphs, standard query probability is graph reliability, while the dissociated probability is the propagation score. We define a propagation score for conjunctive queries without selfjoins and prove (i) that it is is always an upper bound for query reliability, and (ii) that both scores coincide for all safe queries. Given the widespread and successful use of graph propagation methods in practice, we argue for the dissociation method as a good and efficient way to rank probabilistic query results, especially for those queries which are highly intractable for exact probabilistic inference.
On factorisation of provenance polynomials
 In TaPP
, 2011
"... Tracking and managing provenance information in databases has applications in incomplete information and probabilistic databases, query evaluation under bag semantics, view maintenance and update, debugging and ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
(Show Context)
Tracking and managing provenance information in databases has applications in incomplete information and probabilistic databases, query evaluation under bag semantics, view maintenance and update, debugging and