Results 1  10
of
22
The Complexity of Causality and Responsibility for Query Answers and nonAnswers
"... An answer to a query has a welldefined lineage expression (alternatively called howprovenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a nonanswer to a query. However, the cause of an answer or nonanswer is a more subtle notion and consi ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
(Show Context)
An answer to a query has a welldefined lineage expression (alternatively called howprovenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a nonanswer to a query. However, the cause of an answer or nonanswer is a more subtle notion and consists, in general, of only a fragment of the lineage. In this paper, we adapt Halpern, Pearl, and Chockler’s recent definitions of causality and responsibility to define the causes of answers and nonanswers to queries, and their degree of responsibility. Responsibility captures the notion of degree of causality and serves to rank potentially many causes by their relative contributions to the effect. Then, we study the complexity of computing causes and responsibilities for conjunctive queries. It is known that computing causes is NPcomplete in general. Our first main result shows that all causes to conjunctive queries can be computed by a relational query which may involve negation. Thus, causality can be computed in PTIME, and very efficiently so. Next, we study computing responsibility. Here, we prove that the complexity depends on the conjunctive query and demonstrate a dichotomy between PTIME and NPcomplete cases. For the PTIME cases, we give a nontrivial algorithm, consisting of a reduction to the maxflow computation problem. Finally, we prove that, even when it is in PTIME, responsibility is complete for LOGSPACE, implying that, unlike causality, it cannot be computed by a relational query. 1.
Approximate Confidence Computation in Probabilistic Databases
"... Abstract—This paper introduces a deterministic approximation algorithm with error guarantees for computing the probability of propositional formulas over discrete random variables. The algorithmisbasedonanincrementalcompilationofformulasinto decision diagrams using three types of decompositions: Sha ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
(Show Context)
Abstract—This paper introduces a deterministic approximation algorithm with error guarantees for computing the probability of propositional formulas over discrete random variables. The algorithmisbasedonanincrementalcompilationofformulasinto decision diagrams using three types of decompositions: Shannon expansion, independence partitioning, and product factorization. With each decomposition step, lower and upper bounds on the probability of the partially compiled formula can be quickly computed and checked against the allowed error. This algorithm can be effectively used to compute approximate confidence values of answer tuples to positive relational algebra queries on general probabilistic databases (ctables with discrete probability distributions). We further tune our algorithm so as to capture all known tractable conjunctive queries without selfjoins on tupleindependent probabilistic databases: In this case, the algorithm requires time polynomial in the input size even for exact computation. We implementedthealgorithm as anextension of theSPROUT query engine. An extensive experimental effort shows that it consistently outperforms stateofart approximation techniques by several orders of magnitude. I.
Computing query probability with incidence algebras
 In PODS
"... We describe an algorithm that evaluates queries over probabilistic databases using Mobius ’ inversion formula in incidence algebras. The queries we consider are unions of conjunctive queries (equivalently: existential, positive First Order sentences), and the probabilistic databases are tupleindepe ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
(Show Context)
We describe an algorithm that evaluates queries over probabilistic databases using Mobius ’ inversion formula in incidence algebras. The queries we consider are unions of conjunctive queries (equivalently: existential, positive First Order sentences), and the probabilistic databases are tupleindependent structures. Our algorithm runs in PTIME on a subset of queries called ”safe ” queries, and is complete, in the sense that every unsafe query is hard for the class F P #P. The algorithm is very simple and easy to implement in practice, yet it is nonobvious. Mobius ’ inversion formula, which is in essence inclusionexclusion, plays a key role for completeness, by allowing the algorithm to compute the probability of some safe queries even when they have some subqueries that are unsafe. We also apply the same latticetheoretic techniques to analyze an algorithm based on lifted conditioning, and prove that it is incomplete. 1
On the Optimal Approximation of Queries Using Tractable Propositional Languages
"... This paper investigates the problem of approximating conjunctive queries without selfjoins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted langua ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
This paper investigates the problem of approximating conjunctive queries without selfjoins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted language that are greatest lower bound and least upper bound, respectively, ofΦ. We studyboundsin the languages of readonce formulas, where every variable occurs at most once, and of readonce formulas in disjunctive normal form. We show equivalences of syntactic and modeltheoretic characterisations of optimal bounds for unate formulas, and present algorithms that can enumerate them with polynomial delay. Such bounds can be computed by queries expressed using firstorder queries extended with transitive closure and a special choice construct. Besides probabilistic databases, theseresults can also benefit the problem of approximate query evaluation in relational databases, since the bounds expressed by queries can be computed in polynomial combined complexity. Categories andSubject Descriptors H.2.4 [Database Management]: Systems—Query Processing
The Dichotomy of Probabilistic Inference for Unions of Conjunctive Queries
"... We study the complexity of computing the probability of a query on a probabilistic database. The queries that we consider are unions of conjunctive queries, UCQ: equivalently, these are positive, existential First Order Logic sentences, or nonrecursive datalog programs. The databases that we consid ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
(Show Context)
We study the complexity of computing the probability of a query on a probabilistic database. The queries that we consider are unions of conjunctive queries, UCQ: equivalently, these are positive, existential First Order Logic sentences, or nonrecursive datalog programs. The databases that we consider are tupleindependent. We prove the following dichotomy theorem. For every UCQ query, either its probability can be computed in polynomial time in the size of the database, or is hard for FP #P. Our result also has applications to the problem of computing the probability of positive, Boolean expressions, and establishes a dichotomy for such classes based on their structure. For the tractable case, we give a very simple algorithm that alternates between two steps: applying the inclusion/exclusion formula, and removing one existential variable. A key, and novel feature of this algorithm is that it avoids computing terms that cancel out in the inclusion/exclusion formula, in other words it only computes those terms whose Mobius function in an appropriate lattice is nonzero. We show that This simple feature is a key ingredient needed to ensure completeness. For the hardness proof, we give a reduction from the counting problem for positive, partitioned 2CNF, which is known to be #Pcomplete. The hardness proof is nontrivial, and uses techniques from logic and from classical algebra.
Probabilistic Databases with MarkoViews
"... Most of the work on query evaluation in probabilistic databases has focused on the simple tupleindependent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, treedecomposit ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
Most of the work on query evaluation in probabilistic databases has focused on the simple tupleindependent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, treedecomposition and a variety of approximation algorithms. However, complex data analytics tasks often require complex correlations between tuples, and here query evaluation is significantly more expensive, or more restrictive. In this paper, we propose MVDB as a framework both for representing complex correlations and for efficient query evaluation. An MVDB specifies correlations by views, called MarkoViews, on the probabilistic relations and declaring the weights of the view’s outputs. An MVDB is a (very large) Markov Logic Network. We make two sets of contributions. First, we show that query evaluation on an MVDB is equivalent to evaluating a Union of Conjunctive Query(UCQ) over a tupleindependent database. The translation is exact (thus allowing the techniques developed for tuple independent databases to be carried over to MVDB), yet it is novel and quite nonobvious (some resulting probabilities may be negative!). This translation in itself though may not lead to much gain since the translated query gets complicated as we try to capture more correlations. Our second contribution is to propose a new query evaluation strategy that exploits offline compilation to speed up online query evaluation. Here we utilize and extend our prior work on compilation of UCQ. We validate experimentally our techniques on a large probabilistic database with MarkoViews inferred from the DBLP data. 1.
Dissociation and Propagation for Efficient Query Evaluation over Probabilistic Databases
, 2010
"... Queries over probabilistic databases are either safe, in which case they can be evaluated entirely in a relational database engine, or unsafe, in which case they need to be evaluated with a generalpurpose inference engine at a high cost. This paper proposes a new approach by which every query is e ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
(Show Context)
Queries over probabilistic databases are either safe, in which case they can be evaluated entirely in a relational database engine, or unsafe, in which case they need to be evaluated with a generalpurpose inference engine at a high cost. This paper proposes a new approach by which every query is evaluated like a safe query inside the database engine, by using a new method called dissociation. A dissociated query is obtained by adding extraneous variables to some atoms until the query becomes safe. We show that the probability of the original query and that of the dissociated query correspond to two wellknown scoring functions on graphs, namely graph reliability (which is #Phard), and the propagation score (which is related to PageRank and is in PTIME): When restricted to graphs, standard query probability is graph reliability, while the dissociated probability is the propagation score. We define a propagation score for conjunctive queries without selfjoins and prove (i) that it is is always an upper bound for query reliability, and (ii) that both scores coincide for all safe queries. Given the widespread and successful use of graph propagation methods in practice, we argue for the dissociation method as a good and efficient way to rank probabilistic query results, especially for those queries which are highly intractable for exact probabilistic inference.
Why so? or why no? functional causality for explaining query answers
 CORR
, 2009
"... In this paper, we propose causality as a unified framework to explain query answers and nonanswers, thus generalizing and extending several previously proposed definitions of provenance and missing query result explanations. Starting from the established definition of actual causes by Halpern and ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
In this paper, we propose causality as a unified framework to explain query answers and nonanswers, thus generalizing and extending several previously proposed definitions of provenance and missing query result explanations. Starting from the established definition of actual causes by Halpern and Pearl [12], we propose functional causes as a refined definition of causality with several desirable properties. These properties allow us to apply our notion of causality in a database context and apply it uniformly to define the causes of query results and their individual contributions in several ways: (i) we can model both provenance as well as nonanswers, (ii) we can define explanations as either data in the input relations or relational operations in a query plan, and (iii) we can give graded degrees of responsibility to individual causes, thus allowing us to rank causes. In particular, our approach allows us to explain contributions to relational aggregate functions and to rank causes according to their respective responsibilities, aiding users in identifying errors in uncertain or untrusted data. Throughout the paper, we illustrate the applicability of our framework with several examples. This is the first work that treats “positive ” and “negative” provenance under the same framework, and establishes the theoretical foundations of causality theory in a database context.
Faster Query Answering in Probabilistic Databases using ReadOnce Functions
"... A boolean expression is in readonce form if each of its variables appears exactly once. When the variables denote independent events in a probability space, the probability of the event denoted by the whole expression in readonce form can be computed in polynomial time (whereas the general problem ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
A boolean expression is in readonce form if each of its variables appears exactly once. When the variables denote independent events in a probability space, the probability of the event denoted by the whole expression in readonce form can be computed in polynomial time (whereas the general problem for arbitrary expressions is #Pcomplete). Known approaches to checking readonce property seem to require putting these expressions in disjunctive normal form. In this paper, we tell a better story for a large subclass of boolean event expressions: those that are generated by conjunctive queries without selfjoins and on tupleindependent probabilistic databases. We first show that given a tupleindependent representation and the provenance graph of an SPJ query plan without selfjoins, we can, without using the DNF of a result event expression, efficiently compute its cooccurrence graph. From this, the readonce form can already, if it exists, be computed efficiently using existing techniques. Our second and key contribution is a complete, efficient, and simple to implement algorithm for computing the readonce forms (whenever they exist) directly, using a new concept, that of cotable graph, which can be significantly smaller than the cooccurrence graph.
On the Tractability of Query Compilation and Bounded Treewidth
, 2012
"... We consider the problem of computing the probability of a Boolean function, which generalizes the model counting problem. Given an OBDD for such a function, its probability can be computed in linear time in the size of the OBDD. In this paper we investigate the connection between treewidth and the s ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We consider the problem of computing the probability of a Boolean function, which generalizes the model counting problem. Given an OBDD for such a function, its probability can be computed in linear time in the size of the OBDD. In this paper we investigate the connection between treewidth and the size of the OBDD. Bounded treewidth has proven to be applicable to many graph problems, which are NPhard in general but become tractable on graphs with bounded treewidth. However, it is less well understood how bounded treewidth can be used for the probability computation problem of a Boolean function. We introduce a new notion of treewidth of a Boolean function, called the expression treewidth, as the smallest treewidth of any DAGexpression representing the function. Our new no