• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Exploiting lineage for confidence computation in uncertain and probabilistic databases (2008)

by A D Sarma, M Theobald, J Widom
Venue:In ICDE
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 40
Next 10 →

ULDBs: Databases with uncertainty and lineage

by Omar Benjelloun, Anish Das Sarma, Alon Halevy, Jennifer Widom - IN VLDB , 2006
"... This paper introduces ULDBs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, howev ..."
Abstract - Cited by 310 (32 self) - Add to MetaCart
This paper introduces ULDBs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, however many applications require the features in tandem. Fundamentally, lineage enables simple and consistent representation of uncertain data, it correlates uncertainty in query results with uncertainty in the input data, and query processing with lineage and uncertainty together presents computational benefits over treating them separately. We show that the ULDB representation is complete, and that it permits straightforward implementation of many relational operations. We define two notions of ULDB minimality—dataminimal and lineage-minimal—and study minimization of ULDB representations under both notions. With lineage, derived relations are no longer self-contained: their uncertainty depends on uncertainty in the base data. We provide an algorithm for the new operation of extracting a database subset in the presence of interconnected uncertainty. Finally, we show how ULDBs enable a new approach to query processing in probabilistic databases. ULDBs form the basis of the Trio system under development at Stanford.

Probabilistic data exchange

by Ronald Fagin, Benny Kimelfeld, Phokion G. Kolaitis - In Proc. ICDT , 2010
"... The work reported here lays the foundations of data exchange in the presence of probabilistic data. This requires rethinking the very basic concepts of traditional data exchange, such as solution, universal solution, and the certain answers of target queries. We develop a framework for data exchange ..."
Abstract - Cited by 40 (7 self) - Add to MetaCart
The work reported here lays the foundations of data exchange in the presence of probabilistic data. This requires rethinking the very basic concepts of traditional data exchange, such as solution, universal solution, and the certain answers of target queries. We develop a framework for data exchange over probabilistic databases, and make a case for its coherence and robustness. This framework applies to arbitrary schema mappings, and finite or countably infinite probability spaces on the source and target instances. After establishing this framework and formulating the key concepts, we study the application of the framework to a concrete and practical setting where probabilistic databases are compactly encoded by means of annotations formulated over random Boolean variables. In this setting, we study the problems of testing for the existence of solutions and universal solutions, materializing such solutions, and evaluating target queries (for unions of conjunctive queries) in both the exact sense and the approximate sense. For each of the problems, we carry out a complexity analysis based on properties of the annotation, in various classes of dependencies. Finally, we show that the framework and results easily and completely generalize to allow not only the data, but also the schema mapping itself to be probabilistic.
(Show Context)

Citation Context

...k to a concrete and practical setting, where the dependencies are from widely-studied classes, and where the probabilistic databases are compactly encoded in various conventional manners (e.g., as in =-=[2, 6, 10, 31, 43]-=-). Furthermore, in Section 6, we extend the framework and the results to allow the schema mapping (and the data) to be probabilistic. In principle, we could use this extended setting right from the be...

Exploiting Shared Correlations in Probabilistic Databases

by Prithviraj Sen, Amol Deshpande, Lise Getoor , 2008
"... There has been a recent surge in work in probabilistic databases, propelled in large part by the huge increase in noisy data sources — from sensor data, experimental data, data from uncurated sources, and many others. There is a growing need for database management systems that can efficiently repre ..."
Abstract - Cited by 36 (6 self) - Add to MetaCart
There has been a recent surge in work in probabilistic databases, propelled in large part by the huge increase in noisy data sources — from sensor data, experimental data, data from uncurated sources, and many others. There is a growing need for database management systems that can efficiently represent and query such data. In this work, we show how data characteristics can be leveraged to make the query evaluation process more efficient. In particular, we exploit what we refer to as shared correlations where the same uncertainties and correlations occur repeatedly in the data. Shared correlations occur mainly due to two reasons: (1) Uncertainty and correlations usually come from general statistics and rarely vary on a tuple-to-tuple basis; (2) The query evaluation procedure itself tends to re-introduce the same correlations. Prior work has shown that the query evaluation problem on probabilistic databases is equivalent to a probabilistic inference problem on an appropriately constructed probabilistic graphical model (PGM). We leverage this by introducing a new data structure, called the random variable elimination graph (rv-elim graph) that can be built from the PGM obtained from query evaluation. We develop techniques based on bisimulation that can be used to compress the rv-elim graph exploiting the presence of shared correlations in the PGM, the compressed rv-elim graph can then be used to run inference. We validate our methods by evaluating them empirically and show that even with a few shared correlations significant speed-ups are possible.

PrDB: managing and exploiting rich correlations in probabilistic databases

by Prithviraj Sen, Amol Deshpande, Lise Getoor , 2009
"... ..."
Abstract - Cited by 31 (6 self) - Add to MetaCart
Abstract not found

Containment of Conjunctive Queries on Annotated Relations

by Todd J. Green
"... We study containment and equivalence of (unions of) conjunctive queries on relations annotated with elements of a commutative semiring. Such relations and the semantics of positive relational queries on them were introduced in a recent paper as a generalization of set semantics, bag semantics, incom ..."
Abstract - Cited by 30 (6 self) - Add to MetaCart
We study containment and equivalence of (unions of) conjunctive queries on relations annotated with elements of a commutative semiring. Such relations and the semantics of positive relational queries on them were introduced in a recent paper as a generalization of set semantics, bag semantics, incomplete databases, and databases annotated with various kinds of provenance information. We obtain positive decidability results and complexity characterizations for databases with lineage, why-provenance, and provenance polynomial annotations, for both conjunctive queries and unions of conjunctive queries. At least one of these results is surprising given that provenance polynomial annotations seem “more expressive ” than bag semantics and under the latter, containment of unions of conjunctive queries is known to be undecidable. The decision procedures rely on interesting variations on the notion of containment mappings. We also show that for any positive semiring (a very large class) and conjunctive queries without self-joins, equivalence is the same as isomorphism. 1.

The Dichotomy of Probabilistic Inference for Unions of Conjunctive Queries

by Nilesh Dalvi, Dan Suciu
"... We study the complexity of computing the probability of a query on a probabilistic database. The queries that we consider are unions of conjunctive queries, UCQ: equivalently, these are positive, existential First Order Logic sentences, or non-recursive datalog programs. The databases that we consid ..."
Abstract - Cited by 16 (7 self) - Add to MetaCart
We study the complexity of computing the probability of a query on a probabilistic database. The queries that we consider are unions of conjunctive queries, UCQ: equivalently, these are positive, existential First Order Logic sentences, or non-recursive datalog programs. The databases that we consider are tuple-independent. We prove the following dichotomy theorem. For every UCQ query, either its probability can be computed in polynomial time in the size of the database, or is hard for FP #P. Our result also has applications to the problem of computing the probability of positive, Boolean expressions, and establishes a dichotomy for such classes based on their structure. For the tractable case, we give a very simple algorithm that alternates between two steps: applying the inclusion/exclusion formula, and removing one existential variable. A key, and novel feature of this algorithm is that it avoids computing terms that cancel out in the inclusion/exclusion formula, in other words it only computes those terms whose Mobius function in an appropriate lattice is non-zero. We show that This simple feature is a key ingredient needed to ensure completeness. For the hardness proof, we give a reduction from the counting problem for positive, partitioned 2CNF, which is known to be #P-complete. The hardness proof is non-trivial, and uses techniques from logic and from classical algebra.
(Show Context)

Citation Context

...plexity of computing a query on a probabilistic database. Our workismotivatedbyprobabilisticdatabases[Cavallo and Pittarelli 1987; Dalvi and Suciu 2004; Sen and Deshpande 2007; Dalvi and Suciu 2007b; =-=Sarma et al. 2008-=-; Olteanu et al. 2009; Olteanu and Huang 2009; Suciu et al. 2011], the model counting problem, and the problemofcomputingtheprobabilityofpropositionalformulas[Creignou and Hermann 1996; Darwiche 2000;...

Capturing data uncertainty in high-volume stream processing

by Yanlei Diao, Boduo Li, Liping Peng, Thanh Tran, Michael Zink - In CIDR , 2009
"... We present the design and development of a data stream system that captures data uncertainty from data collection to query processing to final result generation. Our system focuses on data that is naturally modeled as continuous ran-dom variables such as many types of sensor data. To provide an end- ..."
Abstract - Cited by 14 (2 self) - Add to MetaCart
We present the design and development of a data stream system that captures data uncertainty from data collection to query processing to final result generation. Our system focuses on data that is naturally modeled as continuous ran-dom variables such as many types of sensor data. To provide an end-to-end solution, our system employs probabilistic modeling and inference to generate uncertainty description for raw data, and then a suite of statistical techniques to capture changes of uncertainty as data propagates through query operators. To cope with high-volume streams, we ex-plore advanced approximation techniques for both space and time efficiency. We are currently working with a group of scientists to evaluate our system using traces collected from real-world applications for hazardous weather monitoring and for object tracking and monitoring. 1.
(Show Context)

Citation Context

...es. When this operator computes result distributions for a set of tuples with overlapping lineage structures, it can apply optimizations to compute for all these tuples in a shared manner (similar to =-=[52]-=-). Furthermore, it may also be possible to find approximate lineage [50] that gives a good approximation of the result distributions and allows more efficient computation. In our immediate future rese...

Bridging the gap between intensional and extensional query evaluation in probabilistic databases. EDBT

by Abhay Jha, Dan Olteanu, Dan Suciu , 2010
"... ..."
Abstract - Cited by 13 (6 self) - Add to MetaCart
Abstract not found

Top-k dominating queries in uncertain databases

by Xiang Lian, Lei Chen - in EDBT, 2009
"... Due to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate and tackle an important query, namely probabilist ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
Due to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate and tackle an important query, namely probabilistic top-k dominating (PTD) query, in the uncertain database. In particular, a PTD query re-trieves k uncertain objects that are expected to dynamically domi-nate the largest number of uncertain objects. We propose an effec-tive pruning approach to reduce the PTD search space, and present an efficient query procedure to answer PTD queries. Furthermore, approximate PTD query processing and the case where the PTD query is issued from an uncertain query object are also discussed. Extensive experiments have demonstrated the efficiency and effec-tiveness of our proposed PTD query processing approaches. 1.
(Show Context)

Citation Context

...to use). In contrast, users do not need to specify ranking functions in the PTD query, and the size (i.e. k) of the PTD answer set can be controlled by users. In literature of probabilistic databases =-=[25, 28, 12, 27, 24, 26, 11]-=-, the probability that an object belongs to the database might be smaller than 1 (in contrast, uncertain objects must exist in the uncertain database). There are some existing works on top-k query pro...

Database Foundations for Scalable RDF Processing

by Katja Hose, Ralf Schenkel, Martin Theobald, Gerhard Weikum - In Reasoning Web
"... Abstract. As more and more data is provided in RDF format, storing huge amounts of RDF data and efficiently processing queries on such data is becoming increasingly important. The first part of the lecture will introduce state-of-the-art techniques for scalably storing and query-ing RDF with relatio ..."
Abstract - Cited by 9 (2 self) - Add to MetaCart
Abstract. As more and more data is provided in RDF format, storing huge amounts of RDF data and efficiently processing queries on such data is becoming increasingly important. The first part of the lecture will introduce state-of-the-art techniques for scalably storing and query-ing RDF with relational systems, including alternatives for storing RDF, efficient index structures, and query optimization techniques. As central-ized RDF repositories have limitations in scalability and failure tolerance, decentralized architectures have been proposed. The second part of the lecture will highlight system architectures and strategies for distributed RDF processing. We cover search engines as well as federated query pro-cessing, highlight differences to classic federated database systems, and discuss efficient techniques for distributed query processing in general and for RDF data in particular. Moreover, for the last part of this chap-ter, we argue that extracting knowledge from the Web is an excellent showcase – and potentially one of the biggest challenges – for the scal-
(Show Context)

Citation Context

...tions, which may yield significant efficiency benefits for query processing. Later extensions to Trio have investigated in more detail how to exploit lineage for probabilistic confidence computations =-=[124]-=- and data updates [125]. MayBMS. The MayBMS [8,66] system initially developed at Saarland University and then at the Cornell database group is designed as a completely native extension to PostgreSQL. ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University