Results 1 - 10
of
11
Knowledge-Based Integration of Neuroscience Data Sources
, 2000
"... The need for information integration is paramount in many biological disciplines, because of the large heterogeneity in both the types of data involved and in the diversity of approaches (physiological, anatomical, biochemical, etc.) taken by biologists to study the same or correlated phenomena. How ..."
Abstract
-
Cited by 27 (11 self)
- Add to MetaCart
The need for information integration is paramount in many biological disciplines, because of the large heterogeneity in both the types of data involved and in the diversity of approaches (physiological, anatomical, biochemical, etc.) taken by biologists to study the same or correlated phenomena. However, the very heterogeneity makes the task of information integration very difficult since two approaches studying different aspects of the same phenomena may not even share common attributes in their schema description. This paper develops a wrapper-mediator architecture which extends the conventional data- and vieworiented information mediation approach by incorporating additional knowledge-modules that bridge the gap between the heterogeneous data sources. The semantic integration of the disparate local data sources employs F-logic as a data and knowledge representation and reasoning formalism. We show that the rich object-oriented modeling features of F-logic together with its declarative rule language and the uniform treatment of data and metadata (schema information) make it an ideal candidate for complex integration tasks. We substantiate this claim by elaborating on our integration architecture and illustrating the approach using real world examples from the neuroscience domain. The complete integration framework is currently under development; a first prototype establishing the viability of the approach is operational.
Learning Information Extraction Rules: An Inductive Logic Programming approach
, 2002
"... The objective of this work is to learn information extraction rules by applying Inductive Logic Programming (ILP) techniques to natural language data. The approach is ontology-based, which means that the extraction rules conclude with specific ontology relations that characterise the meaning of sent ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
The objective of this work is to learn information extraction rules by applying Inductive Logic Programming (ILP) techniques to natural language data. The approach is ontology-based, which means that the extraction rules conclude with specific ontology relations that characterise the meaning of sentences in the text. An existing ILP system, FOIL, is used to learn attribute-value relations. This enables instances of these relations to be identified in the text. In specific, we explore the linguistic preprocessing of the data, the use of background knowledge in the learning process, and the practical considerations of applying a supervised learning approach to rule induction, i.e. in terms of the human effort in creating the data set, and in the inherent biases in the use of small data sets.
Meta-Data Based Mediator Generation
- In Proceedings of the Third International Conference on Cooperative Information Systems
, 1998
"... Mediators are a critical component of any data warehouse; they transform data from source formats to the warehouse representation while resolving semantic and syntactic conflicts. The close relationship between mediators and databases requires a mediator to be updated whenever an associated schema i ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Mediators are a critical component of any data warehouse; they transform data from source formats to the warehouse representation while resolving semantic and syntactic conflicts. The close relationship between mediators and databases requires a mediator to be updated whenever an associated schema is modified. Failure to quickly perform these updates significantly reduces the reliability of the warehouse because queries do not have access to the most current data. This may result in incorrect or misleading responses, and reduce user confidence in the warehouse. Unfortunately, this maintenance may be a significant undertaking if a warehouse integrates several dynamic data sources. This paper describes a meta-data framework, and associated software, designed to automate a significant portion of the mediator generation task and thereby reduce the effort involved in adapting to schema changes. By allowing the DBA to concentrate on identifying the modifications at a high level, instead of r...
Knowledge representation and indexing using the unified medical language system
- PSB
, 2000
"... Ontologies and semantic frameworks can be used to improve the accuracy and expressiveness of natural language processing for the purpose of extracting meaning from technical documents. This is especially true when a rich ontology such as the Unified Medical Language System (UMLS) is available. This ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Ontologies and semantic frameworks can be used to improve the accuracy and expressiveness of natural language processing for the purpose of extracting meaning from technical documents. This is especially true when a rich ontology such as the Unified Medical Language System (UMLS) is available. This paper reports on some tools being developed to make this possible and on some experience with a user interface based on ontologies and semantic networks that allows for interactive knowledge exploration. 1
PharmGKB: the Pharmacogenetics Knowledge Base
- Nucleic Acids Research
, 2002
"... The Pharmacogenetics Knowledge Base (PharmGKB; http://www.pharmgkb.org/) contains genomic, phenotype and clinical information collected from ongoing pharmacogenetic studies. Tools to browse, query, download, submit, edit and process the information are available to registered research network member ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The Pharmacogenetics Knowledge Base (PharmGKB; http://www.pharmgkb.org/) contains genomic, phenotype and clinical information collected from ongoing pharmacogenetic studies. Tools to browse, query, download, submit, edit and process the information are available to registered research network members. A subset of the tools is publicly available. PharmGKB currently contains over 150 genes under study, 14 Coriell populations and a large ontology of pharmacogenetics concepts. The pharmacogenetic concepts and the experimental data are interconnected by a set of relations to form a knowledge base of information for pharmacogenetic researchers. The information in PharmGKB, and its associated tools for processing that information, are tailored for leading-edge pharmacogenetics research. The PharmGKB project was initiated in April 2000 and the first version of the knowledge base went online in February 2001.
Computational Modeling of Structured Experimental Data
- Meth. Enzymol
, 1999
"... this paper, we describe the design principles behind the RiboWEB knowledge base, illustrate how we have represented certain key types of experiments, and describe the resulting knowledge base as it is currently publicly available. In order to demonstrate the utility of structured 4 representations, ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
this paper, we describe the design principles behind the RiboWEB knowledge base, illustrate how we have represented certain key types of experiments, and describe the resulting knowledge base as it is currently publicly available. In order to demonstrate the utility of structured 4 representations, we have written computer programs which use the knowledge base to create summary statistics about the density of information about the RNA and protein components of the 30S ribosome--as a function of the type of experiments performed. The RiboWEB system is also capable of interactively evaluating the consistency of particular data sets with proposed three-dimensional models of the ribosome. IMPLEMENTATION We have built our system around the concept of an ontology. For our purposes, the key features of an ontology are (1) a hierarchical classification of concepts (classes) from general to specific; (2) a frame, or list of attributes, for each class with a range of permitted values for each attribute; and (3) a set of relations between classes to link concepts in the ontology in more complicated ways than implied by the underlying hierarchy. The classes and relations are together referred to as the ontology. The leaves of the classification tree are termed instances, and represent concrete examples of the more abstract classes found in the internal part of the tree. Each attribute of an instance may have a corresponding value, whereas classes typically specify only that the attribute exists. Thus, a class may be proteins, but an instance would be ribosomal protein S5 in E. coli. The attribute "amino acid sequence" is assigned to the protein class, but is given a specific value only in the context of an instance. We define a knowledge base as the combination of an ontology and ...
Representing Genomic Knowledge in the UMLS Semantic Network
- Proceedings of the AMIA Symposium, 181–185 (ISBN
, 1999
"... This paper describes our efforts to integrate concepts important to genomics research with the UMLS semantic network. We found that the UMLS contains over 30 semantic types and most of the semantic relations that are essential for representing the underlying genomic knowledge. In addition, we observ ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes our efforts to integrate concepts important to genomics research with the UMLS semantic network. We found that the UMLS contains over 30 semantic types and most of the semantic relations that are essential for representing the underlying genomic knowledge. In addition, we observed that the organization of the network was appropriate for representing the hierarchical organization of the concepts. Because some of the concepts critical to the genomic domain were found to be missing, we propose to extend the network by adding six new semantic types and sixteen new semantic relations
Addressing Biological Complexity to Enable Knowledge Sharing
- AAAI Workshop on Knowledge Sharing Across Biological and Medical Knowledge Based Systems
"... Domain ontologies are now commonly used to enable heterogeneous information resources, such as knowledge-based systems, to communicate with each other. In this article, we present a classification of ontological mismatches, which represent various ways in which knowledge sharing can be impeded by di ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Domain ontologies are now commonly used to enable heterogeneous information resources, such as knowledge-based systems, to communicate with each other. In this article, we present a classification of ontological mismatches, which represent various ways in which knowledge sharing can be impeded by different decisions made by the developers of different resources. We address some of the ways in which the complexity of biological knowledge will inevitably lead to such mismatches arising across different resources. As ontological mismatches can limit the potential for knowledge sharing, we assess the potential for the resolution of these problems. 1. Introduction It is widely recognised that the principled sharing of knowledge across heterogeneous information systems requires the use of domain ontologies as the basis for achieving a common understanding of the domain. An ontology is the specification of a conceptualisation (Gruber, 1993), usually in the form of a logical theory that forma...
Meta-Data and Biological Sequence Annotation
- META-DATA'99 : 3rd IEEE META-DATA Conference, National Institutes of Health Natcher
, 1999
"... One of the challenges of this decade is to find and access relevant information on the World Wide Web (WWW). It is therefore critical to create very well-designed information servers. We focus our attention on servers dedicated to experimental science, i.e. more specifically servers dedicated to bio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
One of the challenges of this decade is to find and access relevant information on the World Wide Web (WWW). It is therefore critical to create very well-designed information servers. We focus our attention on servers dedicated to experimental science, i.e. more specifically servers dedicated to biology. Applied science requires tools that influence and process captured data. A popular example is the various genome sequencing projects. Experimental data (raw DNA sequences) need to be converted into "interpreted" data (annotations), and this process involves various analytical strategies (i.e. gene finding by pattern recognition and sequence comparison, identification and characterization of regulatory and repetitive elements, ...). We came to the conclusion that a major advancement can be achieved by introducing meta-data into these systems. The function of meta-data descriptions is to enable to abstract and capture the essential information in the underlying data independent of representational details [15]. In this paper, we describe an approach that takes meta-data into account, in order to have more reliable data (high rate of correct functional annotations) within the public sequence databases. A first implementation of this approach has been done and an attempt for the evaluation of the advantages of such an approach is presented.
Bayesian Network Development
"... Bayesian networks are a popular mechanism for dealing with uncertainty in complex situations. They are a fundamental probabilistic representation mechanism that subsumes a great variety of other stochastic modeling methods, such as hidden Markov models, stochastic dynamic systems. Bayesian networks, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Bayesian networks are a popular mechanism for dealing with uncertainty in complex situations. They are a fundamental probabilistic representation mechanism that subsumes a great variety of other stochastic modeling methods, such as hidden Markov models, stochastic dynamic systems. Bayesian networks, in principle, make it possible to build large, complex stochastic models from standard components. Development methodologies for Bayesian networks have been introduced based on software engineering methodologies. However, this is complicated by the significant differences between the crisp, logical foundations of modern software and the fuzzy, empirical nature of stochastic modeling. Conversely, software engineering would benefit from better integration with Bayesian networks, so that uncertainty and stochastic inference can be introduced in a more systematic and formal manner than it is now. In this paper, Bayesian networks and stochastic inference are briefly introduced, and the development of Bayesian networks is compared with the development of objectoriented software. The challenges involved in Bayesian network development are then discussed.

