Results 1 - 10
of
510
The Lorel Query Language for Semistructured Data
- International Journal on Digital Libraries
, 1997
"... We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inapprop ..."
Abstract
-
Cited by 731 (29 self)
- Add to MetaCart
(Show Context)
We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular, some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of ODMG and the language as an extension of OQL. The main novelties of the Lorel language are: (i) extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching
, 2002
"... Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (sch ..."
Abstract
-
Cited by 592 (12 self)
- Add to MetaCart
(Show Context)
Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the ‘accuracy ’ of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings.
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
, 1997
"... In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are ..."
Abstract
-
Cited by 572 (13 self)
- Add to MetaCart
In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and sample values, and enabling query optimization. This paper presents the theoretical foundations of DataGuides along with an algorithm for their creation and an overview of incremental maintenance. We provide performance results based on our implementation of DataGuides in the Lore DBMS for semistructured data. We also describe the use of DataGuides in Lore, both in the user interface to enable structure browsing and query formulation, and as a means of guiding the query processor and optimizing query execution.
The TSIMMIS Project: Integration of Heterogeneous Information Sources
"... The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, ..."
Abstract
-
Cited by 535 (19 self)
- Add to MetaCart
(Show Context)
The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, that translate information into a common object model, that combine information from several sources, that allow browsing of information, and that manage constraints across heterogeneous sites. Tsimmis is a joint project between Stanford and the IBM Almaden Research Center.
Information integration using logical views
, 1997
"... A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algorithms for conjunctive quer ..."
Abstract
-
Cited by 485 (4 self)
- Add to MetaCart
A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algorithms for conjunctive queries and/or Datalog programs. Then we compare the approaches taken by AT&T Labs' "Information Manifold" and the Stanford "Tsimmis" project in these terms.
Why and Where: A Characterization of Data Provenance
- In ICDT
, 2001
"... With the proliferation of database views and curated databases, the issue of data provenance # where a piece of data came from and the process by which it arrived in the database # is becoming increasingly important, especially in scienti#c databases where understanding provenance is crucial to ..."
Abstract
-
Cited by 430 (18 self)
- Add to MetaCart
(Show Context)
With the proliferation of database views and curated databases, the issue of data provenance # where a piece of data came from and the process by which it arrived in the database # is becoming increasingly important, especially in scienti#c databases where understanding provenance is crucial to the accuracy and currency of data. In this paper we describe an approach to computing provenance when the data of interest has been created by a database query.We adopt a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML. A novel aspect of our work is a distinction between #why" provenance #refers to the source data that had some in#uence on the existence of the data# and #where" provenance #refers to the location#s# in the source databases from which the data was extracted#.
The TSIMMIS Approach to Mediation: Data Models and Languages
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
, 1997
"... TSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It o ers a data model and a common query language that are designed to support the combining of information from many different sources. It also o ers tools for generating automatically th ..."
Abstract
-
Cited by 417 (9 self)
- Add to MetaCart
TSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It o ers a data model and a common query language that are designed to support the combining of information from many different sources. It also o ers tools for generating automatically the components that are needed to build systems for integrating information. In this paper we shall discuss the principal architectural features and their rationale.
A Query Language and Optimization Techniques for Unstructured Data
, 1996
"... A new kind of data model has recently emerged in which the database is not constrained by a conventional schema. Systems like ACeDB, which has become very popular with biologists, and the recent Tsimmis proposal for data integration organize data in tree-like structures whose components can be used ..."
Abstract
-
Cited by 407 (35 self)
- Add to MetaCart
A new kind of data model has recently emerged in which the database is not constrained by a conventional schema. Systems like ACeDB, which has become very popular with biologists, and the recent Tsimmis proposal for data integration organize data in tree-like structures whose components can be used equally well to represent sets and tuples. Such structures allow great flexibility in data representation What query language is appropriate for such structures? Here we propose a simple language UnQL for querying data organized as a rooted, edge-labeled graph. In this model, relational data may be represented as fixed-depth trees, and on such trees UnQL is equivalent to the relational algebra. The novelty of UnQL consists in its programming constructs for arbitrarily deep data and for cyclic structures. While strictly more powerful than query languages with path expressions like XSQL, UnQL can still be efficiently evaluated. We describe new optimization techniques for the deep or "vertical" dimension of UnQL queries. Furthermore, we show that known optimization techniques for operators on flat relations apply to the "horizontal" dimension of UnQL.
Lore: A database management system for semistructured data
- SIGMOD Record
, 1997
"... Lore (for Lightweight Object Repository) is a DBMS designed specifically for managing semistructured information. Implementing Lore has required rethinking all aspects of a DBMS, including storage management, indexing, query processing and optimization, and user interfaces. This paper provides an ov ..."
Abstract
-
Cited by 339 (24 self)
- Add to MetaCart
(Show Context)
Lore (for Lightweight Object Repository) is a DBMS designed specifically for managing semistructured information. Implementing Lore has required rethinking all aspects of a DBMS, including storage management, indexing, query processing and optimization, and user interfaces. This paper provides an overview of these aspects of the Lore system, as well as other novel features such as dynamic structural summaries and seamless access to data from external sources.