Results 1 - 10
of
45
The TSIMMIS Project: Integration of Heterogeneous Information Sources
"... The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, ..."
Abstract
-
Cited by 451 (16 self)
- Add to MetaCart
The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, that translate information into a common object model, that combine information from several sources, that allow browsing of information, and that manage constraints across heterogeneous sites. Tsimmis is a joint project between Stanford and the IBM Almaden Research Center.
Semantic and schematic similarities between database objects: A context-based approach
- VLDB Journal
, 1996
"... Inamultidatabase system, schematic con icts between two objects are usually of interest only when the objects have some semantic similarity. We use the concept of semantic proximity, which is essentially an abstraction/mapping between the domains of the two objects associated with the context of com ..."
Abstract
-
Cited by 141 (12 self)
- Add to MetaCart
Inamultidatabase system, schematic con icts between two objects are usually of interest only when the objects have some semantic similarity. We use the concept of semantic proximity, which is essentially an abstraction/mapping between the domains of the two objects associated with the context of comparison. An explicit though partial context representation is proposed and the speci city relationship between contexts is de ned. The contexts are organized as a meet semi-lattice and associated operations like the greatest lower bound (glb) are de ned. The context of comparison and the type of abstractions used to relate the two objects form the basis of a semantic taxonomy. Atthesemantic level, the intensional description of database objects provided by the context is expressed in a description logic language. Schema correspondences are used to store mappings from the semantic level to the data level and are associated with the respective contexts. Inferences about database content at the federation level are modeled as changes in the context and the associated schema correspondences. We try to reconcile the dual (schematic and semantic) perspecitves by: enumerating possible semantic similarities between objects having schema and data conicts, and modeling schema correspondences as the projection of semantic proximity wrt context. 1
So Far (Schematically) yet so Near (Semantically)
, 1992
"... In a multidatabase system, schematic conflicts between two objects are usually of interest only when the objects have some semantic affinity. In this paper we try to reconcile the two perspectives. We first define the concept of semantic proximity and provide a semantic taxonomy. We then enumerate a ..."
Abstract
-
Cited by 93 (1 self)
- Add to MetaCart
In a multidatabase system, schematic conflicts between two objects are usually of interest only when the objects have some semantic affinity. In this paper we try to reconcile the two perspectives. We first define the concept of semantic proximity and provide a semantic taxonomy. We then enumerate and classify the schematic and data conflicts. We discuss possible semantic similarities between two objects that have various types of schematic and data conflicts. Issues of uncertain information and inconsistent information are also addressed.
Managing semantic heterogeneity with production rules and persistent queues
- In Proceedings of the Nineteenth International Conference on Very Large Data Bases
, 1993
"... Abstract. We show that production rules and persis-tent queues together provide a convenient mechanism for maintaining consistency in semantically heterogeneous multi-database environments. We describe a specification language and methods for automatically deriving production rules that maintain (1) ..."
Abstract
-
Cited by 76 (8 self)
- Add to MetaCart
Abstract. We show that production rules and persis-tent queues together provide a convenient mechanism for maintaining consistency in semantically heterogeneous multi-database environments. We describe a specification language and methods for automatically deriving production rules that maintain (1) existence dependencies, in which the presence of data in one database implies the presence of related data in another, and (2) value dependencies, in which the value of data in one database is baaed on the value of related data in another. The production rules derived from dependency specifications use persistent queues to monitor and maintain the dependencies automatically, asynchronously, incremen-tally, and correctly. 1
Challenges in Integrating Biological Data Sources
- Journal of Computational Biology
, 1995
"... this report, we examine the technical challenges to integration, critique the available tools and resources, and compare the cost and advantages of various methodologies. We begin by analyzing the basic steps in strict and complete integration: 1) transformation of the various schemas to a common da ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
this report, we examine the technical challenges to integration, critique the available tools and resources, and compare the cost and advantages of various methodologies. We begin by analyzing the basic steps in strict and complete integration: 1) transformation of the various schemas to a common data model; 2) matching of semantically related schema objects; 3) schema integration; 4) transformation of data to the federated database on demand; and 5) matching of semantically equivalent data. Some progress has been made on generic problems such as (1) and (3) within the wider database community, but issues of semantics (steps (2) and (5)) have only been dealt with any degree of success by domain experts within the biological community. We then look at the solution space of integration strategies as defined by two axes, the "tightness" of federation and the "degree" of instantiation, discuss where various solutions fall on this plane, and examine their cost and advantages/disadvantages. Finally, we examine technical challenges that are not -3- July 12, 1995
K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources
, 2000
"... The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with t ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application.
Distributed Object Management
, 1992
"... Future information processing environments will consist of a vast network of heterogeneous, autonomous, and distributed computing resources, including computers (from mainframe to personal), information-intensive applications, and data (files and databases). A key challenge in this environment is pr ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
Future information processing environments will consist of a vast network of heterogeneous, autonomous, and distributed computing resources, including computers (from mainframe to personal), information-intensive applications, and data (files and databases). A key challenge in this environment is providing capabilities for combining this varied collection of resources into an integrated distributed system, allowing resources to be flexibly combined, and their activities coordinated, to address challenging new information processing requirements. In this paper, we describe the concept of distributed object management, and identify its role in the development of these open, interoperable systems. We identify the key aspects of system architectures supporting distributed object management, and describe specific elements of a distributed object management system being developed at GTE Laboratories. 1. Introduction Today, computer usage is expanding into all parts, and all functions, of lar...
Semantics of Database Transformations
- In B. Thalheim, L. Libkin, Eds., Semantics in Databases, LNCS 1358
, 1998
"... Abstract. Database transformations arise in many di erent settings including database integration, evolution of database systems, and implementing user views and data-entry tools. This paper surveys approaches that have beentaken to problems in these settings, assesses their strengths and weaknesses ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Abstract. Database transformations arise in many di erent settings including database integration, evolution of database systems, and implementing user views and data-entry tools. This paper surveys approaches that have beentaken to problems in these settings, assesses their strengths and weaknesses, and develops requirements on a formal model for specifying and implementing database transformations. We also consider the problem of insuring the correctness of database transformations. In particular, we demonstrate that the usefulness of correctness conditions such as information preservation is hindered by theinteractions of transformations and database constraints, and the limited expressive power of established database constraint languages. We conclude that more general notions of correctness are required, and that there is a need for a uniform formalism for expressing both database transformations and constraints, and reasoning about their interactions. Finally we introduce WOL, a declarative language for specifying and implementing database transformations and constraints. We brie y describe the WOL language and its semantics, and argue that it addresses many of the requirements on a formalism for dealing with general database transformations. 1
Design and implementation of a virtual information system for agile manufacturing
- IIE Transactions
, 1997
"... In the new and emerging Agile Manufacturing paradigm, where multiple firms cooperate under flexible virtual enterprise structures, there exists much need for a mechanism to manage and control information flow among collaborating partners. In response to this pressing need, this paper addresses the d ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
In the new and emerging Agile Manufacturing paradigm, where multiple firms cooperate under flexible virtual enterprise structures, there exists much need for a mechanism to manage and control information flow among collaborating partners. In response to this pressing need, this paper addresses the design and implementation of an agile manufacturing information system integrating manufacturing databases dispersed at various partner sites. We propose a framework in which: (1) Information is modeled in a hierarchical fashion using Object Oriented Methodology (OOM). (2) Information transactions are specified by the workflow hierarchy consisting of partner workflows. (3) Information flow between partners is controlled by a set of distributed Workflow Managers (WM) interacting with partner knowledge bases, which reflect partner specific information control rules on internal data exchange, as well as inter-partner mutual protocols for joint partner communications. (4) The prototype system is accomplished using the world wide web based on a Client-Server architecture. The overall approach and system provides within a dynamic environment, where virtual partnerships are synthesized in response to specific business initiatives, a dynamic and flexible mechanism to support partner information exchange and to keep the dispersed information consistent.

