Results 11 - 20
of
22
Ontology-based semantic annotations for biochip domain
- In EKAW
, 2004
"... Abstract: We propose a semi-automatic method using the information extraction (IE) techniques for facilitating the generation of ontology-based annotations for scientific articles. Furthermore, we evaluate and discuss our method by applying it to the annotation of textual corpus provided by biologis ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract: We propose a semi-automatic method using the information extraction (IE) techniques for facilitating the generation of ontology-based annotations for scientific articles. Furthermore, we evaluate and discuss our method by applying it to the annotation of textual corpus provided by biologists working in the biochip domain. Finally, we argue that ontology-based semantic annotation can improve information retrieval. 1
ADVANCES IN AUTOMATIC TERMINOLOGY PROCESSING: METHODOLOGY AND APPLICATION IN FOCUS
, 2007
"... This work or any part thereof has not previously been presented in any form to the University or to any other institutional body whether for assessment, publication, or for other purposes. Save for any express acknowledgements, references and/or bibliographies cited in the work, I confirm that the i ..."
Abstract
- Add to MetaCart
This work or any part thereof has not previously been presented in any form to the University or to any other institutional body whether for assessment, publication, or for other purposes. Save for any express acknowledgements, references and/or bibliographies cited in the work, I confirm that the intellectual content of the work is the result of my own efforts and of no other person. The right of Le An Ha to be identified as author of this work is asserted in accordance with ss.77 and 78 of the Copyright, Designs and Patents Act 1988. At this date, copyright is owned by the author. Signature……. Date….. The information and knowledge era, in which we are living, creates challenges in many fields, and terminology is not an exception. The challenges include an exponential growth in the number of specialised documents that are available, in which terms are presented, and the number of newly introduced concepts and terms, which are already beyond our (manual) capacity. A promising solution to this ‘information overload ’ would be to employ automatic or semi-automatic procedures
Towards Incorporating Scientific Literature into Biological Algorithms: A Tutorial
, 2001
"... marizing a group of documents. 3. Identifying specific facts from individual documents or sentences. 4. Incorporating literature to refine biological algorithms. The Tutors Jeffrey T. Chang is a Ph.D. candidate in the Russ Altman lab in the Biomedical Informatics program at Stanford University. He a ..."
Abstract
- Add to MetaCart
marizing a group of documents. 3. Identifying specific facts from individual documents or sentences. 4. Incorporating literature to refine biological algorithms. The Tutors Jeffrey T. Chang is a Ph.D. candidate in the Russ Altman lab in the Biomedical Informatics program at Stanford University. He applies natural language processing techniques to biological problems ranging from pharmaco-genomics to sequence homology searches. Soumya Raychaudhuri is also a M.D./Ph.D. candidate in Russ Altman's lab. His biological interests range from structural biology to interpretation of microarray data. His methodological approaches encompass machine learning methods and natural language processing. Introduction Bioinformatics combines the algorithms and approaches employed in computer science and statistics to analyze, understand, and hypothesize about the large repositories of collected biological data and knowledge. While much of the available biological data, such as sequenced genomes and micro
Tuning Support Vector Machines for Biomedical Named Entity Recognition
- In Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain
, 2002
"... We explore the use of Support Vector Machines (SVMs) for biomedical named entity recognition. To make the SVM training with the available largest corpus -- the GENIA corpus -- tractable, we propose to split the non-entity class into sub-classes, using part-of-speech information. In addition, ..."
Abstract
- Add to MetaCart
We explore the use of Support Vector Machines (SVMs) for biomedical named entity recognition. To make the SVM training with the available largest corpus -- the GENIA corpus -- tractable, we propose to split the non-entity class into sub-classes, using part-of-speech information. In addition, we explore new features such as word cache and the states of an HMM trained by unsupervised learning. Experiments on the GENIA corpus show that our class splitting technique not only enables the training with the GENIA corpus but also improves the accuracy. The proposed new features also contribute to improve the accuracy. We compare our SVMbased recognition system with a system using Maximum Entropy tagging method.
Learning Anchor Verbs for Biological Interaction Patterns From Published Text Articles
, 2002
"... Much of knowledge modeling in the molecular biology domain involves interactions between proteins, genes, various forms of RNA, small molecules, etc. Interactions between these substances are typically extracted and codified manually, increasing the cost and time for modeling and substantially limit ..."
Abstract
- Add to MetaCart
Much of knowledge modeling in the molecular biology domain involves interactions between proteins, genes, various forms of RNA, small molecules, etc. Interactions between these substances are typically extracted and codified manually, increasing the cost and time for modeling and substantially limiting the coverage of the resulting knowledge base. In this paper, we describe an automatic system that learns from text interaction verbs; these verbs can then form the core of automatically retrieved patterns which model classes of biological interactions. We investigate text features relating verbs with genes and proteins, and apply statistical tests and a logistic regression statistical model to determine whether a given verb belongs to the class of interaction verbs. Our system, AVAD, achieves over 87% precision and 82% recall when tested on an 11 million word corpus of journal articles. In addition, we compare the automatically obtained results with a manually constructed database of interaction verbs and show that the automatic approach can significantly enrich the manual list by detecting rarer interaction verbs that were omitted from the database.
Methods in Biomedical Text Mining
, 2008
"... Methods to improve text mining of molecular biology interactions are needed to capture a richer information space and qualify the quality of extraction. Simple interaction models fail to describe contextual and confidence information that would help with more fine-grained analyses. Herein a method ..."
Abstract
- Add to MetaCart
Methods to improve text mining of molecular biology interactions are needed to capture a richer information space and qualify the quality of extraction. Simple interaction models fail to describe contextual and confidence information that would help with more fine-grained analyses. Herein a method is presented to streamline curation of text-mined data and a way to improve text mining of biomedical terms that can be adapted to other domains using different machine learning techniques. These advances can be integrated into more powerful text-mining systems to meet user demand and to further promote the adoption of text-mining tools. Additionally, three studies on the nature of biomedical publications are presented: their novelty hinges on the fact that each asks questions that had not been posed before. They cover the phenomena of retraction, ways to improve the impact of research, and the writing style used in biomedical literature. Retraction is a hot topic in recent times
Literature Mining using Bayesian Networks
"... In biomedical domains, free text electronic literature is an important resource for knowledge discovery and acquisition, particularly to provide a priori components for evaluating or learning domain models. Aiming at the automated extraction of this prior knowledge we discuss the types of uncertaint ..."
Abstract
- Add to MetaCart
In biomedical domains, free text electronic literature is an important resource for knowledge discovery and acquisition, particularly to provide a priori components for evaluating or learning domain models. Aiming at the automated extraction of this prior knowledge we discuss the types of uncertainties in a domain with respect to causal mechanisms, formulate assumptions about their report in scientific papers and derive generative probabilistic models for the occurrences of biomedical concepts in papers. These results allow the discovery and extraction of latent causal dependency relations from the domain literature using minimal linguistic support. Contrary to the currently prevailing methods, which assume that relations are sufficiently formulated for linguistic methods, our approach assumes only the report of causally associated entities without their tentative status or relations, and can discover new relations and prune redundancies by providing a domain-wide model. Therefore the proposed Bayesian network based text mining is an important complement to the linguistic approaches. 1
Information Extraction, Risk Patterns Detection in documents
"... Abstract—Hospital Acquired Infections (HAI) is a real burden for doctors and risk surveillance experts. The impact on patients ’ health and related healthcare cost is very significant and a major concern even for rich countries. Furthermore required data to evaluate the threat is generally not avail ..."
Abstract
- Add to MetaCart
Abstract—Hospital Acquired Infections (HAI) is a real burden for doctors and risk surveillance experts. The impact on patients ’ health and related healthcare cost is very significant and a major concern even for rich countries. Furthermore required data to evaluate the threat is generally not available to experts and that prevents from fast reaction. However, recent advances in Computational Intelligence Techniques such as
A Comparison of Parsing . . .
- NATURAL LANGUAGE ENGINEERING
, 2005
"... This paper reports on a number of experiments which are designed to investigate the extent to which current nlp resources are able to syntactically and semantically analyse biomedical text. We address two tasks: parsing a real corpus with a hand-built widecoverage grammar, producing both syntactic a ..."
Abstract
- Add to MetaCart
This paper reports on a number of experiments which are designed to investigate the extent to which current nlp resources are able to syntactically and semantically analyse biomedical text. We address two tasks: parsing a real corpus with a hand-built widecoverage grammar, producing both syntactic analyses and logical forms; and automatically computing the interpretation of compound nouns where the head is a nominalisation (e.g., hospital arrival means an arrival at hospital, while patient arrival means an arrival of a patient). For the former task we demonstrate that exible and yet constrained `preprocessing ' techniques are crucial to success: these enable us to use part-of-speech tags to overcome inadequate lexical coverage, and to `package up' complex technical expressions prior to parsing so that they are blocked from creating misleading amounts of syntactic complexity. We argue that the xml-processing paradigm is ideally suited for automatically preparing the corpus for parsing. For the latter task, we compute interpretations of the compounds by exploiting surface cues and meaning paraphrases, which in turn are extracted from the parsed corpus. This provides an empirical setting in which we can compare the utility of a comparatively deep parser vs. a shallow one, exploring the trade-o between resolving attachment ambiguities on the one hand and generating errors in the parses on the other. We demonstrate that a model of the meaning of compound nominalisations is achievable with the aid of current broad-coverage parsers.
Printed in the United Kingdom 1 A Comparison of Parsing Technologies for the Biomedical Domain
"... This paper reports on a number of experiments which are designed to investigate the extent to which current nlp resources are able to syntactically and semantically analyse biomedical text. We address two tasks: (a) parsing a real corpus with a hand-built widecoverage grammar, producing both syntact ..."
Abstract
- Add to MetaCart
This paper reports on a number of experiments which are designed to investigate the extent to which current nlp resources are able to syntactically and semantically analyse biomedical text. We address two tasks: (a) parsing a real corpus with a hand-built widecoverage grammar, producing both syntactic analyses and logical forms and (b) automatically computing the interpretation of compound nouns where the head is a nominalisation (e.g., hospital arrival means an arrival at hospital, while patient arrival means an arrival of a patient). For the former task we demonstrate that flexible and yet constrained preprocessing techniques are crucial to success: these enable us to use part-of-speech tags to overcome inadequate lexical coverage, and to package up complex technical expressions prior to parsing so that they are blocked from creating misleading amounts of syntactic complexity. We argue that the xml-processing paradigm is ideally suited for automatically preparing the corpus for parsing. For the latter task, we compute interpretations of the compounds by exploiting surface cues and meaning paraphrases, which in turn are extracted from the parsed corpus. This provides an empirical setting in which we can compare the utility of a comparatively deep parser vs. a shallow one, exploring the trade-off between resolving attachment ambiguities on the one hand and generating errors in the parses on the other. We demonstrate that a model of the meaning of compound nominalisations is achievable with the aid of current broad-coverage parsers. 1

