Results 1 - 10
of
229
Data Exchange: Semantics and Query Answering
- In ICDT
, 2003
"... Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answe ..."
Abstract
-
Cited by 427 (41 self)
- Add to MetaCart
(Show Context)
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answering in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem. We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. A universal solution has no more and no less data than required for data exchange and it represents the entire space of possible solutions. We then identify fairly general, and practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of "certain answers" in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution.
Data Exchange: Getting to the Core
, 2003
"... Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that sat ..."
Abstract
-
Cited by 168 (19 self)
- Add to MetaCart
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that satisfy the constraints of the data exchange problem. In an earlier paper, we identified a special class of solutions that we call universal. A universal solution has homomorphisms into every possible solution, and hence is a "most general possible" solution. Nonetheless, given a source instance, there may be many universal solutions. This naturally raises the question of whether there is a "best" universal solution, and hence a best solution for data exchange. We answer this question by considering the well-known notion of the core of a structure, a notion that was first studied in graph theory, but has also played a role in conjunctive-query processing. The core of a structure is the smallest substructure that is also a homomorphic image of the structure. All universal solutions have the same core (up to isomorphism); we show that this core is also a universal solution, and hence the smallest universal solution. The uniqueness of the core of a universal solution together with its minimality make the core an ideal solution for data exchange. Furthermore, we show that the core is the best among all universal solutions for answering unions of conjunctive queries with inequalities. After this, we investigate the computational complexity of producing the core. Well-known results by Chandra and Merlin imply that, unless P = NP, there is no polynomial-time algorithm that, given a structure as input, returns the core of that structure as output. In contrast, in the context of data e...
Composing Schema Mappings: Second-Order Dependencies to the Rescue
- In PODS
, 2004
"... A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a di#erent schema (the target schema). Schema mappings play a key role in numerous areas of database systems, including database design, informa ..."
Abstract
-
Cited by 159 (20 self)
- Add to MetaCart
A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a di#erent schema (the target schema). Schema mappings play a key role in numerous areas of database systems, including database design, information integration, and model management. A fundamental problem in this context is composing schema mappings: given two successive schema mappings, derive a schema mapping between the source schema of the first and the target schema of the second that has the same e#ect as applying successively the two schema mappings.
Rondo: A Programming Platform for Generic Model Management
, 2003
"... Model management aims at reducing the amount of programming needed for the development of metadata-intensive applications. We present a first complete prototype of a generic modelmanagement system, in which high-level operators are used to manipulate models and mappings between models. We define the ..."
Abstract
-
Cited by 155 (10 self)
- Add to MetaCart
(Show Context)
Model management aims at reducing the amount of programming needed for the development of metadata-intensive applications. We present a first complete prototype of a generic modelmanagement system, in which high-level operators are used to manipulate models and mappings between models. We define the key conceptual structures: models, morphisms, and selectors, and describe their use and implementation. We specify the semantics of the known model-management operators applied to these structures, suggest new ones, and develop new algorithms for implementing the individual operators. We examine the solutions for two model-management tasks that involve manipulations of relational schemas, XML schemas, and SQL views. 1.
Composing Mappings among Data Sources
- In VLDB
, 2003
"... Semantic mappings between data sources play a key role in several data sharing architectures. Mappings provide the relationships between data stored in different sources, and therefore enable answering queries that require data from other nodes in a data sharing network. Composing mappings is one of ..."
Abstract
-
Cited by 140 (9 self)
- Add to MetaCart
Semantic mappings between data sources play a key role in several data sharing architectures. Mappings provide the relationships between data stored in different sources, and therefore enable answering queries that require data from other nodes in a data sharing network. Composing mappings is one of the core problems that lies at the heart of several optimization methods in data sharing networks, such as caching frequently traversed paths and redundancy analysis.
Learning to Match Ontologies on the Semantic Web
, 2003
"... On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them. Manually finding such mappings is tedious, error-prone, and clearly not possible at the Web scale. Hence, th ..."
Abstract
-
Cited by 130 (2 self)
- Add to MetaCart
On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them. Manually finding such mappings is tedious, error-prone, and clearly not possible at the Web scale. Hence, the development of tools to assist in the ontology mapping process is crucial to the success of the Semantic Web. We describe GLUE, a system that employs machine learning techniques to find such mappings. Given two ontologies, for each concept in one ontology GLUE finds the most similar concept in the other ontology. We give well-founded probabilistic definitions to several practical similarity measures, and show that GLUE can work with all of them. Another key feature of GLUE is that it uses multiple learning strategies, each of which exploits well a different type of information either in the data instances or in the taxonomic structure of the ontologies. To further improve matching accuracy, we extend GLUE to incorporate commonsense knowledge and domain constraints into the matching process. Our approach is thus distinguished in that it works with a variety of well-defined similarity notions and that it efficiently incorporates multiple types of knowledge. We describe a set of experiments on several real-world domains, and show that GLUE proposes highly accurate semantic mappings. Finally, we extend GLUE to find complex mappings between ontologies, and describe experiments that show the promise of the approach.
Schema Mappings, Data Exchange, and Metadata Management
, 2005
"... Schema mappings are high-level specifications that describe the relationship between database schemas. Schema mappings are prominent in several different areas of database management, including database design, information integration, data exchange, metadata management, and peer-topeer data managem ..."
Abstract
-
Cited by 127 (11 self)
- Add to MetaCart
(Show Context)
Schema mappings are high-level specifications that describe the relationship between database schemas. Schema mappings are prominent in several different areas of database management, including database design, information integration, data exchange, metadata management, and peer-topeer data management systems. Our main aim in this paper is to present an overview of recent advances in data exchange and metadata management, where the schema mappings are between relational schemas. In addition, we highlight some research issues and directions for future work.
2003. Mapping data in peer-to-peer systems: Semantics and algorithmic issues
- In Proceedings of the ACM SIGMOD 2003
"... We consider the problem of mapping data in peer-topeer data-sharing systems. Such systems often rely on the use of mapping tables listing pairs of corresponding values to search for data residing in different peers. In this paper, we address semantic and algorithmic issues related to the use of mapp ..."
Abstract
-
Cited by 118 (8 self)
- Add to MetaCart
(Show Context)
We consider the problem of mapping data in peer-topeer data-sharing systems. Such systems often rely on the use of mapping tables listing pairs of corresponding values to search for data residing in different peers. In this paper, we address semantic and algorithmic issues related to the use of mapping tables. We begin by arguing why mapping tables are appropriate for data mapping in a peer-to-peer environment. We discuss alternative semantics for these tables and we present a language that allows the user to specify mapping tables under different semantics. Then, we show that by treating mapping tables as constraints (called mapping constraints) on the exchange of information between peers it is possible to reason about them. We motivate why reasoning capabilities are needed to manage mapping tables and show the importance of inferring new mapping tables from existing ones. We study the complexity of this problem and we propose an efficient algorithm for its solution. Finally, we present an implementation along with experimental results that show that mapping tables may be managed efficiently in practice. 1.
Clio Grows Up: From Research Prototype to Industrial Tool
- In ACM SIGMOD International Conference on Management of Data (SIGMOD
, 2005
"... Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the past few years from a research prototype into a technology that is behind some of IBM’s mapping technology. Clio provides a declarative way of specifying schema mappings between either XML or relational s ..."
Abstract
-
Cited by 112 (11 self)
- Add to MetaCart
(Show Context)
Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the past few years from a research prototype into a technology that is behind some of IBM’s mapping technology. Clio provides a declarative way of specifying schema mappings between either XML or relational schemas. Mappings are compiled into an abstract query graph representation that captures the transformation semantics of the mappings. The query graph can then be serialized into different query languages, depending on the kind of schemas and systems involved in the mapping. Clio currently produces XQuery, XSLT, SQL, and SQL/XML queries. In this paper, we revisit the architecture and algorithms behind Clio. We then discuss some implementation issues, optimizations needed for scalability, and general lessons learned in the road towards creating an industrial-strength tool. 1.
Generic Model Management: Concepts and Algorithms
- PH.D. THESIS
, 2003
"... Many challenging problems facing information systems engineering involve
the manipulation of complex metadata artifacts, or models, such as database
schemas, interface specifications, or object diagrams, and mappings between
models. The applications that solve metadata manipulation problems are
comp ..."
Abstract
-
Cited by 93 (5 self)
- Add to MetaCart
Many challenging problems facing information systems engineering involve
the manipulation of complex metadata artifacts, or models, such as database
schemas, interface specifications, or object diagrams, and mappings between
models. The applications that solve metadata manipulation problems are
complex and hard to build. The goal of generic model management is to
reduce the amount of programming needed to develop such applications by
providing a database infrastructure in which a set of high-level algebraic
operators, such as Match, Merge, and Compose, are applied to models and
mappings as a whole rather than to their individual building blocks.
This dissertation presents an initial study of the concepts and algorithms
for generic model management. We describe the first prototype of a generic
model management system, introduce the algebraic operators that are used to
manipulate models and mappings, clarify the semantics of the operators, and
develop novel algorithms for implementing them. In particular, we present an
innovative algorithm based on fixpoint computation that is used for implementing
the generic operator Match, which finds correspondences between
two models. Using the prototype and the operators presented in the dissertation,
we develop solutions for several practically relevant problems, such as
change propagation and reintegration.