Results 1 - 10
of
340
Why and Where: A Characterization of Data Provenance
- In ICDT
, 2001
"... With the proliferation of database views and curated databases, the issue of data provenance # where a piece of data came from and the process by which it arrived in the database # is becoming increasingly important, especially in scienti#c databases where understanding provenance is crucial to ..."
Abstract
-
Cited by 430 (18 self)
- Add to MetaCart
With the proliferation of database views and curated databases, the issue of data provenance # where a piece of data came from and the process by which it arrived in the database # is becoming increasingly important, especially in scienti#c databases where understanding provenance is crucial to the accuracy and currency of data. In this paper we describe an approach to computing provenance when the data of interest has been created by a database query.We adopt a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML. A novel aspect of our work is a distinction between #why" provenance #refers to the source data that had some in#uence on the existence of the data# and #where" provenance #refers to the location#s# in the source databases from which the data was extracted#.
Indexing and Querying XML Data for Regular Path Expressions
- IN VLDB
, 2001
"... With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML ..."
Abstract
-
Cited by 343 (9 self)
- Add to MetaCart
With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML data. This poses a new challenge concerning indexing and searching XML data, because conventional approaches based on tree traversals may not meet the processing requirements under heavy access requests. In this paper, we propose a new system for indexing and storing XML data based on a numbering scheme for elements. This numbering scheme quickly determines the ancestor-descendant relationship between elements in the hierarchy of XML data. We also propose several algorithms for processing regular path expressions, namely, (1) ##-Join for searching paths from an element to another, (2) ##-Join for scanning sorted elements and attributes to find element-attribute pairs, and (3) ##-Join for finding Kleene-Closure on repeated paths or elements. The ##-Join algorithm is highly effective particularly for searching paths that are very long or whose lengths are unknown. Experimental results from our prototype system implementation show that the proposed algorithms can process XML queries with regular path expressions by up to an or- # This work was sponsored in part by National Science Foundation CAREER Award (IIS-9876037) and Research Infrastructure program EIA-0080123. The authors assume all responsibility for the contents of the paper. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its...
Fjording the Stream: An Architecture for Queries over Streaming Sensor Data
, 2002
"... If industry visionaries are correct, our lives will soon be full of sensors, connected together in loose conglomerations via wireless networks, each monitoring and collecting data about the environment at large. These sensors behave very differently from traditional database sources: they have inter ..."
Abstract
-
Cited by 281 (8 self)
- Add to MetaCart
(Show Context)
If industry visionaries are correct, our lives will soon be full of sensors, connected together in loose conglomerations via wireless networks, each monitoring and collecting data about the environment at large. These sensors behave very differently from traditional database sources: they have intermittent connectivity, are limited by severe power constraints, and typically sample periodically and push immediately, keeping no record of historical information. These limitations make traditional database systems inappropriate for queries over sensors. We present the Fjords architecture for managing multiple queries over many sensors, and show how it can be used to limit sensor resource demands while maintaining high query throughput. We evaluate our architecture using traces from a network of traffic sensors deployed on Interstate 80 near Berkeley and present performance results that show how query throughput, communication costs, and power consumption are necessarily coupled in sensor environments.
RQL: A Declarative Query Language for RDF
"... Real-scale Semantic Web applications, such as Web Portals and E-Marketplaces, require the management of voluminous metadata repositories containing descriptive information (i.e., metadata) about the available Web resources and services. Better knowledge about the meaning, usage, accessibility or qua ..."
Abstract
-
Cited by 247 (27 self)
- Add to MetaCart
(Show Context)
Real-scale Semantic Web applications, such as Web Portals and E-Marketplaces, require the management of voluminous metadata repositories containing descriptive information (i.e., metadata) about the available Web resources and services. Better knowledge about the meaning, usage, accessibility or quality of these resources and services will considerably facilitate the automated processing of both Web content and services. In this context, the Resource Description Framework (RDF) enables the creation and exchange of metadata as any other Web data. Although large volumes of RDF descriptions are already appearing (e.g., as exported Portal catalogs or service descriptions), sufficiently expressive declarative languages for querying both RDF descriptions and schemas are still missing. In this paper, we propose RQL, a new RDF query language, relying on a formal graph model that permits the interpretation of superimposed resource descriptions. RQL is an OQL-inspired adaptation of XML query languages to the peculiarities of RDF but, foremost, is an extension of this functionality for uniformly querying both descriptions and schemas. We illustrate the syntax, semantics and core functionality of RQL bymeans of a set of benchmark queries and report on the performance of RSSDB, our persistent RDF Store, for storing and querying voluminous RDF descriptions.
XMill: an Efficient Compressor for XML Data
, 1999
"... We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogene ..."
Abstract
-
Cited by 230 (0 self)
- Add to MetaCart
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data: it uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, and, possibly, user defined compressors for application specific data types. 1 Introduction We have implemented a compressor/decompressor for XML data, to be used in data exchange and archiving, that achieves about twice the compression rate of general-purpose compressors (gzip), at about the same speed. The tool can be downloaded from www.research.att.com/sw/tools/xmill/. XML is now being adopted by many organizations and industry groups, like the healthcare, banking, chemical, and telecommunications industries. The attraction in XML is that it is a self-describi...
Translating Web Data
- In VLDB
, 2002
"... We present a novel framework for mapping between any combination of XML and relational schemas, in which a high-level, userspecified mapping is translated into semantically meaningful queries that transform source data into the target representation. Our approach works in two phases. In the first ph ..."
Abstract
-
Cited by 229 (40 self)
- Add to MetaCart
We present a novel framework for mapping between any combination of XML and relational schemas, in which a high-level, userspecified mapping is translated into semantically meaningful queries that transform source data into the target representation. Our approach works in two phases. In the first phase, the high-level mapping, expressed as a set of inter-schema correspondences, is converted into a set of mappings that capture the design choices made in the source and target schemas (including their hierarchical organization as well as their nested referential constraints).
Query Optimization for XML
- In Proceedings of VLDB
, 1999
"... XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured:the data may be irregular or incomplete, and its structu ..."
Abstract
-
Cited by 208 (3 self)
- Add to MetaCart
XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured:the data may be irregular or incomplete, and its structure may change rapidly or unpredictably. This paper describes the query processor of Lore,aDBMS for XML-based data supporting an expressive query language. We focus primarily on Lore's cost-based query optimizer. While all of the usual problems associated with cost-based query optimization apply to XML-based query languages, a number of additional problems arise, such as new kinds of indexing, more complicated notions of database statistics, and vastly different query execution strategies for different databases. We define appropriate logical and physical query plans, database statistics, and a cost model, and we describe plan enumeration including heuristics for reducing the large search space. Our optimizer is fully implemented in Lore and preliminary performance results are reported.
XIRQL: A Query Language for Information Retrieval in XML Documents
, 2001
"... Based on the document-centric view of XML, we present the query language XIRQL. Current proposals for XML query languages lack most IR-related features, which are weighting and ranking, relevance-oriented search, datatypes with vague predicates, and semantic relativism. XIRQL integrates these featur ..."
Abstract
-
Cited by 190 (7 self)
- Add to MetaCart
Based on the document-centric view of XML, we present the query language XIRQL. Current proposals for XML query languages lack most IR-related features, which are weighting and ranking, relevance-oriented search, datatypes with vague predicates, and semantic relativism. XIRQL integrates these features by using ideas from logic-based probabilistic IR models, in combination with concepts from the database area. For processing XIRQL queries, a path algebra is presented, that also serves as a starting point for query optimization.
Updating XML
- In SIGMOD
, 2001
"... As XML has developed over the past few years, its role has expanded beyond its original domain as a semantics-preserving markup language for online documents, and it is now also the de facto format for interchanging data between heterogeneous systems. Data sources export XML "views" over t ..."
Abstract
-
Cited by 180 (4 self)
- Add to MetaCart
(Show Context)
As XML has developed over the past few years, its role has expanded beyond its original domain as a semantics-preserving markup language for online documents, and it is now also the de facto format for interchanging data between heterogeneous systems. Data sources export XML "views" over their data, and other systems can directly import or query these views. As a result, there has been great interest in languages and systems for expressing queries over XML data, whether the XML is stored in a repository or generated as a view over some other data storage format.
Typechecking for XML Transformers
- IN PROCEEDINGS OF THE NINETEENTH ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS
, 2000
"... ..."
(Show Context)