| Susan Davidson and Chris Overton and Peter Buneman. Challenges in Integrating Biological Data Sources. Journal of Computational Biology, 2(4):557--572, Winter 1995. |
....(Y, Z) in peer two. Mapping tables represent expert knowledge and are typically created by domain specialists. Indeed, currently the creation of mapping tables is a timeconsuming and manual process performed by a set of expert curators. While widely used, especially in the biological domain [7], we are aware of no data management tools currently designed to facilitate the creation, maintenance and management of these tables. 2 Motivating Example Consider an example drawn from the domain of biological databases. Currently, there is an overwhelming number of Genomic data sources ranging ....
....ranging from large 1 public sources, such as GDB [1] to sources that are specific to individual research labs. Integration of these sources to provide uniform access for scientists, although extremely desirable, seems unattainable due to a myriad of political, financial and technical reasons [7]. Among the technical reasons is the inherent heterogeneity of the sources which range from relational databases to formatted files or spreadsheets. In addition, the schemas and formats of the sources evolve rapidly in response to new biological techniques and requirements. To achieve some degree ....
[Article contains additional citation context not shown here]
S. Davidson, G. C. Overton, and P. Buneman. Challenges in Integrating Biological Data Sources. Journal of Computational Biology 2(4):557:572, 1995.
....Contributions: Mapping tables represent ###### ###### #### and are typically created by domain specialists. Indeed, currently the creation of mapping tables is a time consuming and manual process performed by a set of expert curators. While widely used, especially in the biological domain [15], we are aware of no data management tools currently designed to facilitate the creation, maintenance and management of these tables. In this work, we discuss alternative semantics for these tables and we present a language that allows the speci cation of mapping tables under di erent semantics. ....
.... [8] a protein database) and MIM [6] a database about genes and genetic disorders related with these genes) Integration of these sources to provide uniform access for scientists, although extremely desirable, seems unattainable due to a myriad of political, nancial and technical reasons [15]. Among the technical reasons is the inherent heterogeneity of the sources which range from relational databases to formatted les or spreadsheets. In addition, the schemas and formats of the sources evolve rapidly in response to new biological techniques and requirements. Toachieve some degree ....
S. Davidson, G. C. Overton, and P. Buneman. Challenges in Integrating Biological Data Sources. ####### ## ############# #######, 2(4):557-572, 1995.
....graphical user interface (GUI) The Source Model . The Query Transformation Module . The Query Execution Module The Biological Concept Model Some bioinformatics researchers recognise that semantic schema and data matching would be greatly aided by a comprehensive thesaurus of terms (Davidson 1995) or a reference ontology of biological concepts (Karp 1995) In a Because the biological knowledge base is a conceptual model of biological terminology, the words concept and term are used interchangeably in this paper. Figure 1. TAMBIS three layer, mediator wrapper architecture. order to ....
..... mediate between the various data sources by exploiting the biological concept hierarchy to assist in the identification and resolution of equivalences or near equivalences similar approaches have been taken in non biological projects, for example SIMS [Arens93] As (Markowitz 1995) and (Davidson 1995) suggest, integration is costly and the quest for an agreed schema futile. However, our biological terminology does not attempt to force a global schema representing a consistent integrated view of all the component databases. Instead it seeks to describe what is in the component databases and, ....
[Article contains additional citation context not shown here]
Davidson S.B., Overton C., Buneman P., Challenges in Integrating Biological Data Sources, Journal of Computational Biology Vol 2, No 4, 1995.
....of the general query system. It is, however, possible to create such databanks on the fly. These data collections perform simple format conversions, but do not map terms appropriately between databases. The issue of semantic heterogeneity within biology databases is large and difficult to resolve [Davidson et al. 1995]. Data collections can only be passed on to subsequent tasks serially. The SRS system does not allow concurrent tasks to be performed, either automatically or conditionally. To do this, it is probable that a scripting language will be needed. Some commercial systems offer such a device, but these ....
....of a workflow that could be encoded in a workflow specification language and enacted by a workflow management system. there was a strong indication from users that the inability to interoperate between tools was a barrier to asking more complex questions. Such a view is supported by others [Davidson et al. 1995, Department of Energy, 1993] The structural principles set forth in this paper seek to address the basic requirements of interoperating systems, but without describing how it should be implemented. These principles are based on the requirements that users have of such systems. the application to ....
Davidson, S., Overton, C., and Buneman, P. (1995). Challenges in Integrating Biological Data Sources. Journal of Computational Biology, 2(4):557--572.
....to CORBA, but without some of its more elegant aspects. The CORBA standards can be used to smooth out problems of distribution, platform and implementation heterogeneity. Resulting resources can interoperate at a syntactic level, but many bioinformatics resources remain semantically heterogeneous [4, 5]. In essence, this means that concepts can be used in different ways in different resources Gene can mean one thing in one resource and have a slightly different meaning in another. CORBA IDL will not automatically remove this heterogeneity. Another protocol which deserves a mention is the ....
S.B. Davidson, C. Overton, and P. Buneman. Challenges in Integrating Biological Data Sources. Journal of Computational Biology, 2(4), 1995.
....has been a subject of a lot of work and the objectiveofvarious projects, e.g. #Garcia Molina, Papakonstantinou, Quass, Rajaraman, Sagiv, Ullman, Vassalos Widom 1997#. Data integration has been tried and even accomplished successfully in a variety of scienti#c domains, such as biology #Davidson, Overton Buneman 1995# etc. as a lot of diverse databases have been already built for these domains. The exploitation of Internet to interconnect a number of existent information sources is the subject of a lot of recentwork #Web 1998#. Integration of already existent heterogeneous statistical information is the ....
Davidson, S., Overton, G. C. & Buneman, P. #1995#, `Challenges in integrating biological data sources', Journal of Computational Biology 2#4#, 557# 572.
....program that performs further processing, e.g. a spreadsheet. A similar situation arises for many domains where there exist various services and programs for transforming and processing information, yet no integration of these services and programs exists, e.g. in the domain of genomics [2] [3], 4] Genomic resources with integrated computation exist at many diverse sites. Today, these capabilities are used by an end user invoking computations, cutting and pasting intermediate results into local workspaces, combining and editing results from multiple computations, and iterating ....
S.B. Davidson, C. Overton and P. Buneman: "Challenges in Integrating Biological Data Sources"; Computational Biology 2, 1995, pp 557-572.
.... also become evident that interoperability, integration, and collaboration, i.e. the ability to interpret, share and manipulate data and programs from multiple autonomous sources transparently, is the main requirement towards Global Scientific Data Repositories and Services over these repositories [NOAA97, Karp95, Davinson95, Letosky94, Maier94]. To address the interoperability, information integration and management issues emerging in the next generation of scientific data repositories and services, a hybrid architecture is proposed. It combines elements from the digital library technology, mediator based systems and rapid prototyping ....
S.B. Davidson, C. Overton and P. Buneman: "Challenges in Integrating Biological Data Sources: Journal of Computational Biology, Vol. 2, No 4, 1995, pp. 557572.
....evaluation of generalized path expressions [CCM96] The schema is rapidly evolving: In standard database systems, the schema is viewed as almost immutable, schema updates as rare, and it is well accepted that schema updates are very expensive. Now, in contrast, consider the case of genome data [DOB95] The schema is expected to change quite rapidly, at the same speed as experimental techniques are improved or novel techniques introduced. As a consequence, expressive formats such as ASN.1 or ACeDB [TMD92] were preferred to a relational or object database system approach. Indeed, the fact that ....
S.B. Davidson, C. Overton, and P. Buneman. Challenges in integrating biological data sources. J. Computational Biology 2, 1995.
No context found.
Susan Davidson and Chris Overton and Peter Buneman. Challenges in Integrating Biological Data Sources. Journal of Computational Biology, 2(4):557--572, Winter 1995.
No context found.
S.B. Davidson et al. Challenges in integrating biological data sources. JCB, 2:557--572, 1995.
No context found.
Susan Davidson, G. Christian Overton, and Peter Buneman, "Challenges in Integrating Biological Data Sources," Journal of Computational Biology, vol. 2., no 4., 1995, pp. 557-572.
No context found.
Susan Davidson, G. Christian Overton, and Peter Buneman, "Challenges in Integrating Biological Data Sources," Journal of Computational Biology, vol. 2., no 4., 1995, pp. 557-572.
No context found.
S. Davidson, C. Overton, and P. Buneman. Challenges in integrating biological data sources. Journal of Computational Biology, 2:557--572, 1995.
No context found.
S. Davidson, C. Overton and P. Buneman. Challenges in Integrating Biological Data Sources. Journal of Computational Biology. Vol 2, No 4, 1995.
No context found.
) Davidson, S.B., Overton, C., and Buneman, P.: Challenges in Integrating Biological Data Sources, J.Computational Biology,Vol.2, No.4, pp. 557--572 (1995).
No context found.
Susan Davidson, Chris Overton, and Peter Buneman. Challenges in Integrating Biological Data Sources. Journal of Computational Biology, 2(4), 1995. In press.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC