Results 1  10
of
28
ReadOnce Functions and Query Evaluation in Probabilistic Databases
"... Probabilistic databases hold promise of being a viable means for largescale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluat ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Probabilistic databases hold promise of being a viable means for largescale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluation in probabilistic databases has largely concentrated on querycentric formulations (e.g., safe plans, hierarchical queries), in that, they only consider characteristics of the query and not the data in the database. It is easy to construct examples where a supposedly hard query run on an appropriate database gives rise to a tractable query evaluation problem. In this paper, we develop efficient query evaluation techniques that leverage characteristics of both the query and the data in the database. We focus on tupleindependent databases where the query evaluation problem is equivalent to computing marginal probabilities of Boolean formulas associated with the result tuples. Query evaluation is easy if the Boolean formulas can be factorized into a form that has every variable appearing at most once (called readonce); this suggests a naive approach that incorporates previously developed Boolean formula factorization algorithms into the query evaluation. We then develop novel, more efficient factorization algorithms that work for a large subclass of queries (specifically, conjunctive queries without selfjoins), by exploiting the unique structure of the result tuple Boolean formulas. We empirically demonstrate that our proposed techniques are (1) orders of magnitude faster than generic inference algorithms when used to evaluate general readonce functions, and (2) for the special case of hierarchical queries, they rival the efficiency of prior techniques specifically designed to handle such queries. 1.
On the Optimal Approximation of Queries Using Tractable Propositional Languages
"... This paper investigates the problem of approximating conjunctive queries without selfjoins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted langua ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
This paper investigates the problem of approximating conjunctive queries without selfjoins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted language that are greatest lower bound and least upper bound, respectively, ofΦ. We studyboundsin the languages of readonce formulas, where every variable occurs at most once, and of readonce formulas in disjunctive normal form. We show equivalences of syntactic and modeltheoretic characterisations of optimal bounds for unate formulas, and present algorithms that can enumerate them with polynomial delay. Such bounds can be computed by queries expressed using firstorder queries extended with transitive closure and a special choice construct. Besides probabilistic databases, theseresults can also benefit the problem of approximate query evaluation in relational databases, since the bounds expressed by queries can be computed in polynomial combined complexity. Categories andSubject Descriptors H.2.4 [Database Management]: Systems—Query Processing
Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases
 In SIGMOD
, 2011
"... Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly arise in uncertain data management: First, how to p ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly arise in uncertain data management: First, how to provide explanations for query results, e.g., “Why is this tuple in my result? ” or “Why does this output tuple have such high probability?”. Second, the problem of determining the sensitive input tuples for the given query, e.g., users are interested to know the input tuples that can substantially alter the output, when their probabilities are modified (since they may be unsure about the input probability values). Existing systems provide the lineage/provenance of each of the output tuples in addition to the output probabilities, which is a boolean formula indicating the dependence of the output tuple on the input tuples. However, lineage does not immediately provide a quantitative relationship and it is not informative when we have multiple output tuples. In this paper, we propose a unified framework that can handle both the issues mentioned above to facilitate robust query processing. We formally define the notions of influence and explanations and provide algorithms to determine the topℓ influential set of variables and the topℓ set of explanations for a variety of queries, including conjunctive queries, probabilistic threshold queries, topk queries and aggregation queries. Further, our framework naturally enables highly efficient incremental evaluation when input probabilities are modified (e.g., if uncertainty is resolved). Our preliminary experimental results demonstrate the benefits of our framework for performing robust query processing over probabilistic databases.
Bridging the gap between intensional and extensional query evaluation in probabilistic databases. EDBT
, 2010
"... ..."
(Show Context)
Probabilistic Databases with MarkoViews
"... Most of the work on query evaluation in probabilistic databases has focused on the simple tupleindependent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, treedecomposit ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
Most of the work on query evaluation in probabilistic databases has focused on the simple tupleindependent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, treedecomposition and a variety of approximation algorithms. However, complex data analytics tasks often require complex correlations between tuples, and here query evaluation is significantly more expensive, or more restrictive. In this paper, we propose MVDB as a framework both for representing complex correlations and for efficient query evaluation. An MVDB specifies correlations by views, called MarkoViews, on the probabilistic relations and declaring the weights of the view’s outputs. An MVDB is a (very large) Markov Logic Network. We make two sets of contributions. First, we show that query evaluation on an MVDB is equivalent to evaluating a Union of Conjunctive Query(UCQ) over a tupleindependent database. The translation is exact (thus allowing the techniques developed for tuple independent databases to be carried over to MVDB), yet it is novel and quite nonobvious (some resulting probabilities may be negative!). This translation in itself though may not lead to much gain since the translated query gets complicated as we try to capture more correlations. Our second contribution is to propose a new query evaluation strategy that exploits offline compilation to speed up online query evaluation. Here we utilize and extend our prior work on compilation of UCQ. We validate experimentally our techniques on a large probabilistic database with MarkoViews inferred from the DBLP data. 1.
Dissociation and Propagation for Efficient Query Evaluation over Probabilistic Databases
, 2010
"... Queries over probabilistic databases are either safe, in which case they can be evaluated entirely in a relational database engine, or unsafe, in which case they need to be evaluated with a generalpurpose inference engine at a high cost. This paper proposes a new approach by which every query is e ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
(Show Context)
Queries over probabilistic databases are either safe, in which case they can be evaluated entirely in a relational database engine, or unsafe, in which case they need to be evaluated with a generalpurpose inference engine at a high cost. This paper proposes a new approach by which every query is evaluated like a safe query inside the database engine, by using a new method called dissociation. A dissociated query is obtained by adding extraneous variables to some atoms until the query becomes safe. We show that the probability of the original query and that of the dissociated query correspond to two wellknown scoring functions on graphs, namely graph reliability (which is #Phard), and the propagation score (which is related to PageRank and is in PTIME): When restricted to graphs, standard query probability is graph reliability, while the dissociated probability is the propagation score. We define a propagation score for conjunctive queries without selfjoins and prove (i) that it is is always an upper bound for query reliability, and (ii) that both scores coincide for all safe queries. Given the widespread and successful use of graph propagation methods in practice, we argue for the dissociation method as a good and efficient way to rank probabilistic query results, especially for those queries which are highly intractable for exact probabilistic inference.
Database Foundations for Scalable RDF Processing
 In Reasoning Web
"... Abstract. As more and more data is provided in RDF format, storing huge amounts of RDF data and efficiently processing queries on such data is becoming increasingly important. The first part of the lecture will introduce stateoftheart techniques for scalably storing and querying RDF with relatio ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract. As more and more data is provided in RDF format, storing huge amounts of RDF data and efficiently processing queries on such data is becoming increasingly important. The first part of the lecture will introduce stateoftheart techniques for scalably storing and querying RDF with relational systems, including alternatives for storing RDF, efficient index structures, and query optimization techniques. As centralized RDF repositories have limitations in scalability and failure tolerance, decentralized architectures have been proposed. The second part of the lecture will highlight system architectures and strategies for distributed RDF processing. We cover search engines as well as federated query processing, highlight differences to classic federated database systems, and discuss efficient techniques for distributed query processing in general and for RDF data in particular. Moreover, for the last part of this chapter, we argue that extracting knowledge from the Web is an excellent showcase – and potentially one of the biggest challenges – for the scal
Topk Query Processing in Probabilistic Databases with NonMaterialized Views
, 2012
"... In this paper, we investigate a novel approach of computing confidence bounds for topk ranking queries in probabilistic databases with nonmaterialized views. Unlike prior approaches, we present an exact pruning algorithm for finding the topranked query answers according to their marginal probabil ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
In this paper, we investigate a novel approach of computing confidence bounds for topk ranking queries in probabilistic databases with nonmaterialized views. Unlike prior approaches, we present an exact pruning algorithm for finding the topranked query answers according to their marginal probabilities without the need to first materialize all answer candidates via the views. Specifically, we consider conjunctive queries over multiple levels of selectprojectjoin views, the latter of which are cast into Datalog rules, where also the rules themselves may be uncertain, i.e., be valid with some degree of confidence. To our knowledge, this work is the first to address integrated data and confidence computations in the context of probabilistic databases by considering confidence bounds over partially evaluated query answers with firstorder lineage formulas. We further extend our query processing techniques by a toolsuite of scheduling strategies based on selectivity estimation and the expected impact of subgoals on the final confidence of answer candidates. Experiments with large datasets demonstrate drastic runtime improvements over both sampling and decompositionbased methods—even
A TemporalProbabilistic Database Model for Information Extraction
"... Temporal annotations of facts are a key component both for building a highaccuracy knowledge base and for answering queries over the resulting temporal knowledge base with high precision and recall. In this paper, we present a temporalprobabilistic database model for cleaning uncertain temporal fac ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Temporal annotations of facts are a key component both for building a highaccuracy knowledge base and for answering queries over the resulting temporal knowledge base with high precision and recall. In this paper, we present a temporalprobabilistic database model for cleaning uncertain temporal facts obtained from information extraction methods. Specifically, we consider a combination of temporal deduction rules, temporal consistency constraints and probabilistic inference based on the common possibleworlds semantics with data lineage, and we study the theoretical properties of this data model. We further develop a query engine which is capable of scaling to very large temporal knowledge bases, with nearly interactive query response times over millions of uncertain facts and hundreds of thousands of grounded rules. Our experiments over two realworld datasets demonstrate the increased robustness of our approach compared to related techniques based on constraint solving via Integer Linear Programming (ILP) and probabilistic inference via Markov Logic Networks (MLNs). We are also able to show that our runtime performance is more than competitive to current ILP solvers and the fastest available, probabilistic but nontemporal, database engines. 1.
Ranking Query Answers in Probabilistic Databases: Complexity and Efficient Algorithms
"... Abstract—In many applications of probabilistic databases, the probabilities are mere degrees of uncertainty in the data and are not otherwisemeaningfultothe user.Often,userscare onlyabout the ranking of answers in decreasing order of their probabilities or about a few most likely answers. In this pa ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In many applications of probabilistic databases, the probabilities are mere degrees of uncertainty in the data and are not otherwisemeaningfultothe user.Often,userscare onlyabout the ranking of answers in decreasing order of their probabilities or about a few most likely answers. In this paper, we investigate the problem of ranking query answers in probabilistic databases. We give a dichotomy for ranking in case of conjunctive queries without repeating relation symbols: it is either in polynomial time or #Phard. Surprisingly, our syntactic characterisation of tractable queries is not the same as for probability computation. The key observation is that there are queries for which probability computation is #Phard, yet ranking can be computed in polynomial time. This is possible whenever probability computation for distinct answers has a common factorthatishardtocomputebutirrelevantforranking. We complement this tractability analysis with an effective rankingtechniqueforconjunctivequeries.Givenaquery,weconstruct a share plan, which exposes subqueries whose probability computation can be shared or ignored across query answers. Our technique combines share plans with incremental approximate probability computation of subqueries. We implemented our technique in the SPROUT query engine and report on performance gains of orders of magnitude over Monte Carlo simulation using FPRAS and exact probability computation based on knowledge compilation. I.