Results 1 - 10
of
37
TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources
, 1998
"... The TAMBIS project aims to provide transparent access to disparate biological databases and analysis tools, enabling users to utilize a wide range of resources with the minimum of effort. A prototype system has been developed that includes a knowledge base of biological terminology (the biologi ..."
Abstract
-
Cited by 69 (13 self)
- Add to MetaCart
The TAMBIS project aims to provide transparent access to disparate biological databases and analysis tools, enabling users to utilize a wide range of resources with the minimum of effort. A prototype system has been developed that includes a knowledge base of biological terminology (the biological Concept Model), a model of the underlying data sources (the Source Model) and a `knowledge-driven' user interface. Biological concepts are captured in the knowledge base using a description logic called GRAIL. The Concept Model provides the user with the concepts necessary to construct a wide range of multiple-source queries, and the user interface provides a flexible means of constructing and manipulating those queries. The Source Model provides a description of the underlying sources and mappings between terms used in the sources and terms in the biological Concept Model. The Concept Model and Source Model provide a level of indirection that shields the user from source details, providing a high level of source transparency. Source independent, declarative queries formed from terms in the Concept Model are transformed into a set of source dependent, executable procedures. Query formulation, translation and execution is demonstrated using a working example.
K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources
, 2000
"... The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with t ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application.
Mondrian: Annotating and querying databases through colors and blocks
- in ICDE ’06: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06
, 2006
"... Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data ar ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data are only loosely coupled. This paper introduces an annotation-oriented data model for the manipulation and querying of both data and annotations. In particular, the model allows for the specification of annotations on sets of values and for effectively querying the information on their association. We use the concept of block to represent an annotated set of values. Different colors applied to the blocks represent different annotations. We introduce a color query language for our model and prove it to be both complete (it can express all possible queries over the class of annotated databases), and minimal (all the algebra operators are primitive). We present MONDRIAN, a prototype implementation of our annotation mechanism, and we conduct experiments that investigate the set of parameters which influence the evaluation cost for color queries. 1.
A Classification of Tasks in Bioinformatics
, 2001
"... Motivation: This paper reports on a survey of bioinformatics tasks currently undertaken by working biologists. The aim was to find the range of tasks that need to be supported and the components needed to do this in a general query system. This enabled a set of evaluation criteria to be used to asse ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
Motivation: This paper reports on a survey of bioinformatics tasks currently undertaken by working biologists. The aim was to find the range of tasks that need to be supported and the components needed to do this in a general query system. This enabled a set of evaluation criteria to be used to assess both the biology and mechanical nature of general query systems. Results: A classification of the biological content of the tasks gathered offers a check-list for those tasks (and their specialisations) that should be offered in a general bioinformatics query system. This semantic analysis was contrasted with a syntactic analysis that revealed the small number of components required to describe all bioinformatics questions. Both the range of biological tasks and syntactic task components can be seen to provide a set of bioinformatics requirements for general query systems. These requirements were used to evaluate two bioinformatics query systems. Contact: robert.stevens@cs.man.ac.uk. Sup...
Data provenance: some basic issues
- In Foundations of Software Technology and Theoretical Computer Science
, 2000
"... Abstract. The ease with which one can copy and transform data on the Web, has made it increasingly di cult to determine the origins of a piece of data. We use the term data provenance to refer to the process of tracing and recording the origins of data and its movement between databases. Provenance ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Abstract. The ease with which one can copy and transform data on the Web, has made it increasingly di cult to determine the origins of a piece of data. We use the term data provenance to refer to the process of tracing and recording the origins of data and its movement between databases. Provenance is now an acute issue in scienti c databases where it central to the validation of data. In this paper we discuss some of the technical issues that have emerged in an initial exploration of the topic. 1
Bio-ontologies: current trends and future directions
- Brief Bioinform
, 2006
"... In recent years, as a knowledge-based discipline, bioinformatics has moved to make its knowledge more computationally amenable. After its beginnings in the disciplines as a technology advocated by computer scientists to overcome problems of heterogeneity, ontology has been taken up by the biologists ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
In recent years, as a knowledge-based discipline, bioinformatics has moved to make its knowledge more computationally amenable. After its beginnings in the disciplines as a technology advocated by computer scientists to overcome problems of heterogeneity, ontology has been taken up by the biologists themselves as a means to consistently annotate features from genotype to phenotype. In medical informatics, artifacts called ontologies have been used for a longer period of time to produce controlled lexicons for coding schemes. In this article, we review the current position in ontologies and how they have become institutionalized within biomedicine. As the field has matured, the much older philosophical aspects of ontology have come into play. With this and the institutionalization of ontology has come greater formality. We review this trend and what benefits it might bring to ontologies and their use within biomedicine. Author biographies:
Integration of Biological Sources: Current Systems and Challenges Ahead
- Sigmod Record
, 2004
"... This paper surveys the area of biological and genomic sources integration, which has recently become a major focus of the data integration research field. The challenges that an integration system for biological sources must face are due to several factors such as the variety and amount of data avai ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
This paper surveys the area of biological and genomic sources integration, which has recently become a major focus of the data integration research field. The challenges that an integration system for biological sources must face are due to several factors such as the variety and amount of data available, the representational heterogeneity of the data in the different sources, and the autonomy and differing capabilities of the sources.
iProClass: an integrated database of protein family, function and structure information
- Nucleic Acids Res
, 2003
"... The i ProClass databaseprovi--M comprehensihe value-added descriscrxL ofprotei ns and serves as a framework for dataitaxL atixi adi5F i5FF networki-- enviqF-- ent. Theprotei iote matii i ProClassiCl udesfamiL relati6Mx ie as well as structural and functi nal classisx6M5/ s and featur ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
The i ProClass databaseprovi--M comprehensihe value-added descriscrxL ofprotei ns and serves as a framework for dataitaxL atixi adi5F i5FF networki-- enviqF-- ent. Theprotei iote matii i ProClassiCl udesfamiL relati6Mx ie as well as structural and functi nal classisx6M5/ s and features. The currentversiM consi sts of about 830 000 non-redundant PIR-PSD, SWISS-PROT, and TrEMBLprotei s organixD wia more than 36 000 PIR superfamiLM6 , 145 000fami lii 4000 doma 1300moti fs and 550 000 FASTA siTA ariA clusters. It provi46 rii li ks to over 50 database ofprotei n sequences, fami li es, functi ons and pathways, protei n-- protei ni----F actixD4 post-tran slatiran modi catir protei n expressi--L , structures and structural classi ficati--x genes and genomes, ontologi es, liLM--6 ure and taxonomy. Pro and super summary reports present exte nsix annot atit itxjLM i andidxL de membershi p stati tit and grap hipx dipx ay of domaim and moti4M iProClass employs an open and modular archi ecture forirxL operabiera and scalab ialab It i ix6/M/ tedi the Oracle object-relatijx l database system and i updatedbiLqj ly. The database i freely accessi ble from the web six at http: georgetown.edu/i5--xD ass/ and searchable by sequence or text stri g. The dataitax6 i ProClass supports exploratip ofprotei n relatixD shiat Such knowledgei fundamental to the understandia ofprotei nevoluti on, structure andfuncti on and cruciF to functicxL genomi and proteomi research.
Managing Data Mappings in the Hyperion Project
- In Proceedings of the 19th International Conference on Data Engineering (ICDE
, 2003
"... We consider the problem of mapping data in peerto -peer systems. Such systems rely on simple value searches to locate data of interest. However, different peers may use different values to identify or describe the same data. To accommodate this, peer-to-peer systems often rely on mapping tables that ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We consider the problem of mapping data in peerto -peer systems. Such systems rely on simple value searches to locate data of interest. However, different peers may use different values to identify or describe the same data. To accommodate this, peer-to-peer systems often rely on mapping tables that list pairs of corresponding values for search domains that are used in different peers. We illustrate how such tables are used in the Genomics community by expert curators. We then argue why mapping tables are appropriate for data mapping in a peer-to-peer environment and motivate the problem of managing these tables. The work presented here is part of the Hyperion Project [4].
An Examination of DSLs for Concisely Representing Model Traversals and Transformations
- 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9, p. 325a, January 06 - 09
, 2003
"... A key advantage for the use of a Domain-Specific Language (DSL) is the leverage that can be captured from a concise representation of a programmer's intention. This paper reports on three different DSLs that were developed for two different projects. Two of the DSLs assisted in the specification of ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
A key advantage for the use of a Domain-Specific Language (DSL) is the leverage that can be captured from a concise representation of a programmer's intention. This paper reports on three different DSLs that were developed for two different projects. Two of the DSLs assisted in the specification of various modeling tool ontologies, and the integration of models across these tools. On another project, a different DSL has been applied as a language to assist in aspect-oriented modeling. Each of these three languages was converted to C++ using different code generators. These DSLs were concerned with issues of traversing a model and performing transformations. The paper also provides quantitative data on the relative sizes of the intention (as expressed in the DSL) and the generated C++ code. Observations are made regarding the nature of the benefits and the manner in which the conciseness of the DSL is best leveraged.

