Results 1 - 10
of
172
Generic Schema Matching with Cupid
- In The VLDB Journal
, 2001
"... Schema matching is a critical step in many applications, such as XML message mapping, data warehouse loading, and schema integration. In this paper, we investigate algorithms for generic schema matching, outside of any particular data model or application. We first present a taxonomy for past s ..."
Abstract
-
Cited by 604 (17 self)
- Add to MetaCart
Schema matching is a critical step in many applications, such as XML message mapping, data warehouse loading, and schema integration. In this paper, we investigate algorithms for generic schema matching, outside of any particular data model or application. We first present a taxonomy for past solutions, showing that a rich range of techniques is available. We then propose a new algorithm, Cupid, that discovers mappings between schema elements based on their names, data types, constraints, and schema structure, using a broader set of techniques than past approaches. Some of our innovations are the integrated use of linguistic and structural matching, context-dependent matching of shared types, and a bias toward leaf structure where much of the schema content resides. After describing our algorithm, we present experimental results that compare Cupid to two other schema matching systems.
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching
, 2002
"... Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (sch ..."
Abstract
-
Cited by 592 (12 self)
- Add to MetaCart
(Show Context)
Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the ‘accuracy ’ of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings.
COMA - A system for flexible combination of Schema Matching Approaches
- In VLDB
, 2002
"... Schema matching is the task of finding semantic correspondences between elements of two schemas. It is needed in many database applications, such as integration of web data sources, data warehouse loading and XML message mapping. To reduce the amount of user effort as much as possible, automati ..."
Abstract
-
Cited by 443 (12 self)
- Add to MetaCart
(Show Context)
Schema matching is the task of finding semantic correspondences between elements of two schemas. It is needed in many database applications, such as integration of web data sources, data warehouse loading and XML message mapping. To reduce the amount of user effort as much as possible, automatic approaches combining several match techniques are required. While such match approaches have found considerable interest recently, the problem of how to best combine different match algorithms still requires further work. We have thus developed the COMA schema matching system as a platform to combine multiple matchers in a flexible way. We provide a large spectrum of individual matchers, in particular a novel approach aiming at reusing results from previous match operations, and several mechanisms to combine the results of matcher executions. We use COMA as a framework to com- prehensively evaluate the effectiveness of different matchers and their combinations for real-world sche- mas. The results obtained so far show the superiority of combined match approaches and indicate the high value of reuse-oriented strategies.
Applying Model Management to Classical Meta Data Problems
, 2003
"... Model management is a new approach to meta data management that offers a higher level programming interface than current techniques. The main abstractions are models (e.g., schemas, interface definitions) and mappings between models. It treats these abstractions as bulk objects and offers such ..."
Abstract
-
Cited by 259 (21 self)
- Add to MetaCart
Model management is a new approach to meta data management that offers a higher level programming interface than current techniques. The main abstractions are models (e.g., schemas, interface definitions) and mappings between models. It treats these abstractions as bulk objects and offers such operators as Match, Merge, Diff, Compose, Apply, and ModelGen. This paper extends earlier treatments of these operators and applies them to three classical meta data management problems: schema integration, schema evolution, and round-trip engineering.
Rondo: A Programming Platform for Generic Model Management
, 2003
"... Model management aims at reducing the amount of programming needed for the development of metadata-intensive applications. We present a first complete prototype of a generic modelmanagement system, in which high-level operators are used to manipulate models and mappings between models. We define the ..."
Abstract
-
Cited by 155 (10 self)
- Add to MetaCart
(Show Context)
Model management aims at reducing the amount of programming needed for the development of metadata-intensive applications. We present a first complete prototype of a generic modelmanagement system, in which high-level operators are used to manipulate models and mappings between models. We define the key conceptual structures: models, morphisms, and selectors, and describe their use and implementation. We specify the semantics of the known model-management operators applied to these structures, suggest new ones, and develop new algorithms for implementing the individual operators. We examine the solutions for two model-management tasks that involve manipulations of relational schemas, XML schemas, and SQL views. 1.
The Clio Project: Managing Heterogeneity.
- In SIGMOD Record,
, 2001
"... Abstract Clio is a system for managing and facilitating the complex tasks of heterogeneous data transformation and integration. In Clio, we have collected together a powerful set of data management techniques that have proven invaluable in tackling these difficult problems. In this paper, we presen ..."
Abstract
-
Cited by 143 (3 self)
- Add to MetaCart
(Show Context)
Abstract Clio is a system for managing and facilitating the complex tasks of heterogeneous data transformation and integration. In Clio, we have collected together a powerful set of data management techniques that have proven invaluable in tackling these difficult problems. In this paper, we present the underlying themes of our approach and present a brief case study.
Model management 2.0: manipulating richer mappings
- in SIGMOD, 2007
"... Model management is a generic approach to solving problems of data programmability where precisely engineered mappings are required. Applications include data warehousing, e-commerce, object-to-relational wrappers, enterprise information integration, database portals, and report generators. The goal ..."
Abstract
-
Cited by 127 (3 self)
- Add to MetaCart
(Show Context)
Model management is a generic approach to solving problems of data programmability where precisely engineered mappings are required. Applications include data warehousing, e-commerce, object-to-relational wrappers, enterprise information integration, database portals, and report generators. The goal is to develop a model management engine that can support tools for all of these applications. The engine supports operations to match schemas, compose mappings, diff schemas, merge schemas, translate schemas into different data models, and generate data transformations from mappings. Much has been learned about model management since it was proposed seven years ago. This leads us to a revised vision that differs from the original in two main respects: the operations must handle more expressive mappings, and the runtime that executes mappings should be added as an important model management component. We review what has been learned from recent experience, explain the revised model management vision based on that experience, and identify the research problems that the revised vision opens up.
Merging Models Based on Given Correspondences
, 2003
"... A model is a formal description of a complex application artifact, such as a database schema, an application interface, a UML model, an ontology, or a message format. The problem of merging such models lies at the core of many meta data applications, such as view integration, mediated schema creat ..."
Abstract
-
Cited by 118 (12 self)
- Add to MetaCart
A model is a formal description of a complex application artifact, such as a database schema, an application interface, a UML model, an ontology, or a message format. The problem of merging such models lies at the core of many meta data applications, such as view integration, mediated schema creation for data integration, and ontology merging. This paper examines the problem of merging two models given correspondences between them. It presents requirements for conducting a merge and a specific algorithm that subsumes previous work.
On Schema Matching with Opaque Column Names and Data Values
- In SIGMOD
, 2003
"... Most previous solutions to the schema matching problem rely in some fashion upon identifying "similar " column names in the schemas to be matched, or by recognizing common domains in the data stored in the schemas. While each of these approaches is valuable in many cases, they are ..."
Abstract
-
Cited by 114 (2 self)
- Add to MetaCart
(Show Context)
Most previous solutions to the schema matching problem rely in some fashion upon identifying "similar " column names in the schemas to be matched, or by recognizing common domains in the data stored in the schemas. While each of these approaches is valuable in many cases, they are not infallible, and there exist instances of the schema matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schemas and the data in the columns are "opaque " or very difficult to interpret. In this paper we propose a two-step technique that works even in the presence of opaque column names and data values. In the first step, we measure the pair-wise attribute correlations in the tables to be matched and construct a dependency graph using mutual information as a measure of the dependency between attributes. In the second stage, we find matching node pairs in the dependency graphs by running a graph matching algorithm. We validate our approach with an experimental study, the results of which suggest that such an approach can be a useful addition to a set of (semi) automatic schema matching techniques. 1.