Results 1 - 10
of
196
Data Integration: A Theoretical Perspective
- Symposium on Principles of Database Systems
, 2002
"... Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interestin ..."
Abstract
-
Cited by 965 (45 self)
- Add to MetaCart
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
On the Decidability of Query Containment under Constraints
"... Query containment under constraints is the problem of checking whether for every database satisfying a given set of constraints, the result of one query is a subset of the result of another query. Recent research points out that this is a central problem in several database applications, and we addr ..."
Abstract
-
Cited by 256 (56 self)
- Add to MetaCart
Query containment under constraints is the problem of checking whether for every database satisfying a given set of constraints, the result of one query is a subset of the result of another query. Recent research points out that this is a central problem in several database applications, and we address it within a setting where constraints are specified in the form of special inclusion dependencies over complex expressions, built by using intersection and difference of relations, special forms of quantification, regular expressions over binary relations, and cardinality constraints. These types of constraints capture a great variety of data models, including the relational, the entity-relational, and the object-oriented model. We study the problem of checking whether q is contained in q ′ with respect to the constraints specified in a schema S, where q and q ′ are nonrecursive Datalog programs whose atoms are complex expressions. We present the following results on query containment. For the case where q does not contain regular expressions, we provide a method for deciding query containment, and analyze its computational complexity. We do the same for the case where neither S nor q, q ′ contain number restrictions. To the best of our knowledge, this yields the first decidability result on containment of conjunctive queries with regular expressions. Finally, we prove that the problem is undecidable for the case where we admit inequalities in q′.
Context Interchange: New Features and Formalisms for the Intelligent Integration of Information
- ACM TOIS
, 1999
"... The Context Interchange strategy presents a novel perspective for mediated data access in which semantic conflicts among heterogeneous systems are not identified a priori, but are detected and reconciled by a context mediator through comparison of contexts axioms corresponding to the systems engaged ..."
Abstract
-
Cited by 238 (96 self)
- Add to MetaCart
The Context Interchange strategy presents a novel perspective for mediated data access in which semantic conflicts among heterogeneous systems are not identified a priori, but are detected and reconciled by a context mediator through comparison of contexts axioms corresponding to the systems engaged in data exchange. In this article, we show that queries formulated on shared views, export schema, and shared “ontologies ” can be mediated in the same way using the Context Interchange framework. The proposed framework provides a logic-based object-oriented formalism for representing and reasoning about data semantics in disparate systems, and has been validated in a prototype implementation providing mediated data access to both traditional and web-based information sources. Categories and Subject Descriptors: H.2.4 [Database Management]: Systems—Query processing; H.2.5 [Database Management]: Heterogeneous Databases—Data translation
Catching the Boat with Strudel: Experiences with a Web-Site Management System
, 1998
"... The Strudel system applies concepts from database management systems to the process of building Web sites. Strudel's key idea is separating the management of the site's data, the creation and management of the site's structure, and the visual presentation of the site's pages. Fir ..."
Abstract
-
Cited by 211 (23 self)
- Add to MetaCart
The Strudel system applies concepts from database management systems to the process of building Web sites. Strudel's key idea is separating the management of the site's data, the creation and management of the site's structure, and the visual presentation of the site's pages. First, the site builder creates a uniform model of all data available at the site. Second, the builder uses this model to declaratively define the Web site's structure by applying a "site-definition query" to the underlying data. The result of evaluating this query is a "site graph", which represents both the site's content and structure. Third, the builder specifies the visual presentation of pages in Strudel's HTML-template language. The data model underlying Strudel is a semi-structured model of labeled directed graphs. We describe Strudel's key characteristics, report on our experiences using Strudel, and present the technical problems that arose from our experience. We describe our experience constructing sev...
Semantic Integration of Semistructured and Structured Data Sources
- SIGMOD Record
, 1999
"... this paper is to describe the MOMIS [4, 5] (Mediator envirOnment for Multiple Information Sources) approach to the integration and query of multiple, heterogeneous information sources, containing structured and semistructured data. MOMIS has been conceived as a joint collaboration between University ..."
Abstract
-
Cited by 166 (19 self)
- Add to MetaCart
(Show Context)
this paper is to describe the MOMIS [4, 5] (Mediator envirOnment for Multiple Information Sources) approach to the integration and query of multiple, heterogeneous information sources, containing structured and semistructured data. MOMIS has been conceived as a joint collaboration between University of Milano and Modena in the framework of the INTERDATA national research project, aiming at providing methods and tools for data management in Internet-based information systems. Like other integration projects [1, 10, 14], MOMIS follows a "semantic approach" to information integration based on the conceptual schema, or metadata, of the information sources, and on the following architectural elements: i) a common object-oriented data model, defined according to the ODL I 3 language, to describe source schemas for integration purposes. The data model and ODL I 3 have been defined in MOMIS as subset of the ODMG-93 ones, following the proposal for a standard mediator language developed by the I
Description Logics For Conceptual Data Modeling
, 1998
"... The article aims at establishing a logical approach to class-based data modeling. After a discussion on class-based formalisms for data modeling, we introduce a family of logics, called Description Logics, which stem from research on Knowledge Representation in Arti cial Intelligence. The logics ..."
Abstract
-
Cited by 143 (22 self)
- Add to MetaCart
The article aims at establishing a logical approach to class-based data modeling. After a discussion on class-based formalisms for data modeling, we introduce a family of logics, called Description Logics, which stem from research on Knowledge Representation in Arti cial Intelligence. The logics of this family are particularly well suited for specifying data classes and relationships among classes, and are equipped with both formal semantics and inference mechanisms. We demonstrate that several popular data modeling formalisms, including the Entity-Relationship Model, and the most common variants of object-oriented data models, can be expressed in terms of speci c logics of the family. For this purpose we use a unifying Description Logic, which incorporates all the features needed for the logical reformulation of the data models used in the various contexts. We also discuss the problem of devising reasoning procedures for the unifying formalism, and show that they provide valuable supports for several important data modeling activities.
Composing Mappings among Data Sources
- In VLDB
, 2003
"... Semantic mappings between data sources play a key role in several data sharing architectures. Mappings provide the relationships between data stored in different sources, and therefore enable answering queries that require data from other nodes in a data sharing network. Composing mappings is one of ..."
Abstract
-
Cited by 140 (9 self)
- Add to MetaCart
Semantic mappings between data sources play a key role in several data sharing architectures. Mappings provide the relationships between data stored in different sources, and therefore enable answering queries that require data from other nodes in a data sharing network. Composing mappings is one of the core problems that lies at the heart of several optimization methods in data sharing networks, such as caching frequently traversed paths and redundancy analysis.
The Chatty Web: Emergent Semantics Through Gossiping
, 2003
"... This paper describes a novel approach for obtaining semantic interoperability among data sources in a bottom-up, semi-automatic manner without relying on pre-existing, global semantic models. We assume that large amounts of data exist that have been organized and annotated according to local schemas ..."
Abstract
-
Cited by 122 (12 self)
- Add to MetaCart
This paper describes a novel approach for obtaining semantic interoperability among data sources in a bottom-up, semi-automatic manner without relying on pre-existing, global semantic models. We assume that large amounts of data exist that have been organized and annotated according to local schemas. Seeing semantics as a form of agreement, our approach enables the participating data sources to incrementally develop global agreement in an evolutionary and completely decentralized process that solely relies on pair-wise, local interactions: Participants provide translations between schemas they are interested in and can learn about other translations by routing queries (gossiping). To support the participants in assessing the semantic quality of the achieved agreements we develop a formal framework that takes into account both syntactic and semantic criteria. The assessment process is incremental and the quality ratings are adjusted along with the operation of the system. Ultimately, this process results in global agreement, i.e., the semantics that all participants understand. We discuss strategies to efficiently find translations and provide results from a case study to justify our claims. Our approach applies to any system which provides a communication infrastructure (existing websites or databases, decentralized systems, P2P systems) and offers the opportunity to study semantic interoperability as a global phenomenon in a network of information sharing parties.
Data Integration under Integrity Constraints
- Information Systems
, 2002
"... Data integratio n systemspro vide accessto a seto fhetero - geneo us, auto no mo us data so urces thro ugh a so -called glo bal schema. There are basically two appro aches fo r designing a data integratio n system. In the glo bal-centric appro ach,o ne defines the elementso f the glo bal schema as v ..."
Abstract
-
Cited by 115 (22 self)
- Add to MetaCart
(Show Context)
Data integratio n systemspro vide accessto a seto fhetero - geneo us, auto no mo us data so urces thro ugh a so -called glo bal schema. There are basically two appro aches fo r designing a data integratio n system. In the glo bal-centric appro ach,o ne defines the elementso f the glo bal schema as viewso ver the so urces, whereas in the lo cal-centric appro ach, o e characterizes the so rces as viewso ver theglo al schema. It is well kno wn that pro cessing queries in the latter appro ach is similar to query answering with inc o plete infoC atio , and, therefo9 is a c o plex task. On theo ther hand, it is a co mmo no pinio n that query pro cessing is much easier in the fo rmer appro ach. In this paper we sho w the surprising result that, when theglo al schema is expressed in the relatio al mo del with integrity c o straints, eveno f simple types, the pr o lemo f inco6 plete info rmatio n implicitly arises, making querypro cessing di#cult in the glo al-centric approC h as well. We thenfo cuso n glo al schemas with key andfo eign key co straints, which represents a situat io which is veryco#=W in practice, and we illustrate techniques fo e#ectively answering queries po sed to the data integratio n system in this case. 1