Results 1 - 10
of
145
ULDBs: Databases with uncertainty and lineage
- IN VLDB
, 2006
"... This paper introduces ULDBs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, howev ..."
Abstract
-
Cited by 310 (32 self)
- Add to MetaCart
This paper introduces ULDBs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, however many applications require the features in tandem. Fundamentally, lineage enables simple and consistent representation of uncertain data, it correlates uncertainty in query results with uncertainty in the input data, and query processing with lineage and uncertainty together presents computational benefits over treating them separately. We show that the ULDB representation is complete, and that it permits straightforward implementation of many relational operations. We define two notions of ULDB minimality—dataminimal and lineage-minimal—and study minimization of ULDB representations under both notions. With lineage, derived relations are no longer self-contained: their uncertainty depends on uncertainty in the base data. We provide an algorithm for the new operation of extracting a database subset in the presence of interconnected uncertainty. Finally, we show how ULDBs enable a new approach to query processing in probabilistic databases. ULDBs form the basis of the Trio system under development at Stanford.
Complexity of Answering Queries Using Materialized Views. In
- PODS,
, 1998
"... Abstract We study the complexity of the problem of answering queries using materialized views. This problem has attracted a lot of attention recently because of its relevance in data integration. Previous work considered only conjunctive view definitions. We examine the consequences of allowing mor ..."
Abstract
-
Cited by 308 (5 self)
- Add to MetaCart
Abstract We study the complexity of the problem of answering queries using materialized views. This problem has attracted a lot of attention recently because of its relevance in data integration. Previous work considered only conjunctive view definitions. We examine the consequences of allowing more expressive view definition languages. The languages we consider for view definitions and user queries are: conjunctive queries with inequality, positive queries, datalog, and first-order logic. We show that the complexity of the problem depends on whether views are assumed to store all the tuples that satisfy the view definition, or only a subset of it. Finally, we apply the results to the view consistency and view self-maintainability problems which arise in data warehousing. 2
Semantics and Complexity of SPARQL
"... SPARQL is the standard language for querying RDF data. In this article, we address systematically the formal study of the database aspects of SPARQL, concentrating in its graph pattern matching facility. We provide a compositional semantics for the core part of SPARQL, and study the complexity of th ..."
Abstract
-
Cited by 277 (25 self)
- Add to MetaCart
SPARQL is the standard language for querying RDF data. In this article, we address systematically the formal study of the database aspects of SPARQL, concentrating in its graph pattern matching facility. We provide a compositional semantics for the core part of SPARQL, and study the complexity of the evaluation of several fragments of the language. Among other complexity results, we show that the evaluation of general SPARQL patterns is PSPACE-complete. We identify a large class of SPARQL patterns, defined by imposing a simple and natural syntactic restriction, where the query evaluation problem can be solved more efficiently. This restriction gives rise to the class of well-designed patterns. We show that the evaluation problem is coNP-complete for well-designed patterns. Moreover, we provide several rewriting rules for well-designed patterns whose application may have a considerable impact in the cost of evaluating SPARQL queries.
Top-k query processing in uncertain databases
- In ICDE
, 2007
"... Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between score and uncertainty makes traditional techniques inapplicable. We introduce new probabilistic formulations for top-k queries. Our formulations are based on ..."
Abstract
-
Cited by 125 (9 self)
- Add to MetaCart
(Show Context)
Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between score and uncertainty makes traditional techniques inapplicable. We introduce new probabilistic formulations for top-k queries. Our formulations are based on “marriage ” of traditional top-k semantics and possible worlds semantics. In the light of these formulations, we construct a framework that encapsulates a state space model and efficient query processing techniques to tackle the challenges of uncertain data settings. We prove that our techniques are optimal in terms of the number of accessed tuples and materialized search states. Our experiments show the efficiency of our techniques under different data distributions with orders of magnitude improvement over naïve materialization of possible worlds. 1
Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data
, 2000
"... Strict consistency of replicated data is infeasible or not required by many distributed applications, so current systems often permit stale replication,inwhich cached copies of data values are allowed to become out of date. Queries over cached data return an answer quickly, but the stale answer ..."
Abstract
-
Cited by 104 (8 self)
- Add to MetaCart
(Show Context)
Strict consistency of replicated data is infeasible or not required by many distributed applications, so current systems often permit stale replication,inwhich cached copies of data values are allowed to become out of date. Queries over cached data return an answer quickly, but the stale answer may be unboundedly imprecise. Alternatively, queries over remote master data return a precise answer, but with potentially poor performance. To bridge the gap between these two extremes, we propose a new class of replication systems called TRAPP (Tradeoff in Replication Precision and Performance). TRAPP systems give each user fine-grained control over the tradeoff between precision and performance: Caches store ranges that are guaranteed to bound the current data values, instead of storing stale exact values. Users supply a quantitative precision constraint along with each query. To answer a query, TRAPP systems automatically select a combination of locally cached bounds and exact master data stored remotely to deliver a bounded answer consisting of a range that is no wider than the specified precision constraint, that is guaranteed to contain the precise answer, and that is computed as quickly as possible. This paper defines the architecture of TRAPP replication systems and covers some mechanics of caching data ranges. It then focuses on queries with aggregation, presenting optimization algorithms for answering queries with precision constraints, and reporting on performance experiments that demonstrate the fine-grained control of the precision-performance tradeoff offered by TRAPP systems.
Probabilistic skylines on uncertain data
- In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), Viena
, 2007
"... Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this pap ..."
Abstract
-
Cited by 103 (19 self)
- Add to MetaCart
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. 1.
Fast and Simple Relational Processing of Uncertain Data
"... Abstract — This paper introduces U-relations, a succinct and purely relational representation system for uncertain databases. U-relations support attribute-level uncertainty using vertical partitioning. If we consider positive relational algebra extended by an operation for computing possible answer ..."
Abstract
-
Cited by 91 (4 self)
- Add to MetaCart
(Show Context)
Abstract — This paper introduces U-relations, a succinct and purely relational representation system for uncertain databases. U-relations support attribute-level uncertainty using vertical partitioning. If we consider positive relational algebra extended by an operation for computing possible answers, a query on the logical level can be translated into, and evaluated as, a single relational algebra query on the U-relational representation. The translation scheme essentially preserves the size of the query in terms of number of operations and, in particular, number of joins. Standard techniques employed in off-the-shelf relational database management systems are effective for optimizing and processing queries on U-relations. In our experiments we show that query evaluation on U-relations scales to large amounts of data with high degrees of uncertainty.
XML Data Exchange: Consistency and Query Answering
, 2005
"... Data exchange is the problem of finding an instance of a target schema, given an instance of a source schema and a specification of the relationship between the source and the target. Theoretical foundations of data exchange have recently been investigated for relational data. In this paper, we star ..."
Abstract
-
Cited by 88 (12 self)
- Add to MetaCart
Data exchange is the problem of finding an instance of a target schema, given an instance of a source schema and a specification of the relationship between the source and the target. Theoretical foundations of data exchange have recently been investigated for relational data. In this paper, we start looking into the basic properties of XML data exchange, that is, restructuring of XML documents that conform to a source DTD under a target DTD, and answering queries written over the target schema. We define XML data exchange settings in which sourceto-target dependencies refer to the hierarchical structure of the data. Combining DTDs and dependencies makes some XML data exchange settings inconsistent. We investigate the consistency problem and determine its exact complexity. We then move to query answering, and prove a dichotomy theorem that classifies data exchange settings into those over which query answering is tractable, and those over which it is coNPcomplete, depending on classes of regular expressions used in DTDs. Furthermore, for all tractable cases we give polynomial-time algorithms that compute target XML documents over which queries can be answered.
Models for incomplete and probabilistic information.
- IEEE Data Engineering Bulletin
, 2006
"... Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of c-tables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing proba ..."
Abstract
-
Cited by 83 (9 self)
- Add to MetaCart
(Show Context)
Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of c-tables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probabilistic models as incompleteness models with additional probability information, we define completeness and closure under query languages of general probabilistic database models and we introduce a new such model, probabilistic c-tables, that is shown to be complete and closed under the relational algebra.
Probabilistic Deductive Databases
, 1994
"... Knowledge-base (KB) systems must typically deal with imperfection in knowledge, e.g. in the form of imcompleteness, inconsistency, uncertainty, to name a few. Currently KB system development is mainly based on the expert system technology. Expert systems, through their support for rule-based program ..."
Abstract
-
Cited by 70 (2 self)
- Add to MetaCart
Knowledge-base (KB) systems must typically deal with imperfection in knowledge, e.g. in the form of imcompleteness, inconsistency, uncertainty, to name a few. Currently KB system development is mainly based on the expert system technology. Expert systems, through their support for rule-based programming, uncertainty, etc., offer a convenient framework for KB system development. But they require the user to be well versed with the low level details of system implementation. The manner in which uncertainty is handled has little mathematical basis. There is no decent notion of query optimization, forcing the user to take the responsibility for an efficient implementation of the KB system. We contend KB system development can and should take advantage of the deductive database technology, which overcomes most of the above limitations. An important problem here is to extend deductive databases into providing a systematic basis for rule-based programming with imperfect knowledge. In this paper, we are interested in an exension handling probabilistic knowledge.