Results 1 - 10
of
147
Accomplishments and Challenges in Literature Data Mining for Biology
, 2002
"... We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened fro ..."
Abstract
-
Cited by 161 (12 self)
- Add to MetaCart
(Show Context)
We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened from recognition of protein interactions to arange of problems such as improving homology search, identifying cellular location, and so on. To encourage participation and accelerate progress in this expanding field, we propose creating challenge evaluations, and we describe two specific applications in this context.
Mining the Biomedical Literature in the Genomic Era: An Overview
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2003
"... The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last f ..."
Abstract
-
Cited by 132 (5 self)
- Add to MetaCart
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years there is a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature, and find the nuggets of information most relevant and useful for specific analysis tasks. This paper
Comparative Experiments on Learning Information Extractors for Proteins and their Interactions
, 2004
"... Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data relevant to genes of the human genome from the 11 million abstracts in M ..."
Abstract
-
Cited by 106 (7 self)
- Add to MetaCart
Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data relevant to genes of the human genome from the 11 million abstracts in Medline. However, extraction eorts have been frustrated by the lack of conventions for describing human genes and proteins. We have developed and evaluated a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting information on interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and maximum entropy are able to identify human proteins with higher accuracy than several previous approaches. We also demonstrate that various rule induction methods are able to identify protein interactions with higher precision than manually-developed rules.
Discovering patterns to extract protein-protein interactions from full texts
- BIOINFORMATICS
, 2004
"... Motivation: Although there are several databases storing protein–protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extrac ..."
Abstract
-
Cited by 97 (6 self)
- Add to MetaCart
Motivation: Although there are several databases storing protein–protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein–protein interactions from biomedical texts. Results: We present a novel and robust approach for extracting protein–protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0 % and precision rate of 80.5%.
Protein structures and information extraction from biological texts: The PASTA system
- Journal of Bioinformatics
, 2003
"... Motivation: The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow. Results: We describe the P ..."
Abstract
-
Cited by 79 (6 self)
- Add to MetaCart
(Show Context)
Motivation: The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow. Results: We describe the Protein Active Site Template Acquisition (PASTA) system, which addresses these problems by performing automatic extraction of informa-tion relating to the roles of specific amino acid residues in protein molecules from online scientific articles and abstracts. Both the terminology recognition and extraction capabilities of the system have been extensively eval-uated against manually annotated data and the results compare favourably with state-of-the-art results obtained in less challenging domains. PASTA is the first information extraction (IE) system developed for the protein structure domain and one of the most thoroughly evaluated IE system operating on biological scientific text to date. Availability: PASTA makes its extraction results available via a browser-based front end:
Classifying semantic relations in bioscience texts
, 2004
"... A crucial step toward the goal of automatic extraction of propositional information from natural language text is the identification of semantic relations between constituents in sentences. We examine the problem of distinguishing among seven relation types that can occur between the entities “treat ..."
Abstract
-
Cited by 71 (1 self)
- Add to MetaCart
A crucial step toward the goal of automatic extraction of propositional information from natural language text is the identification of semantic relations between constituents in sentences. We examine the problem of distinguishing among seven relation types that can occur between the entities “treatment” and “disease” in bioscience text, and the problem of identifying such entities. We compare five generative graphical models and a neural network, using lexical, syntactic, and semantic features, finding that the latter help achieve high classification accuracy.
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data
- Journal of Biomedical Informatics
, 2004
"... The immense growth in the volume of research literature and experimental data in the field of molecular biology calls for e#cient automatic methods to capture and store information. In recent years, several groups have worked on specific problems in this area, such as automated selection of articles ..."
Abstract
-
Cited by 65 (4 self)
- Add to MetaCart
(Show Context)
The immense growth in the volume of research literature and experimental data in the field of molecular biology calls for e#cient automatic methods to capture and store information. In recent years, several groups have worked on specific problems in this area, such as automated selection of articles pertinent to molecular biology, or automated extraction of information using natural-language processing, information visualization, and generation of specialized knowledge bases for molecular biology. GeneWays is an integrated system that combines several such subtasks. It analyzes interactions between molecular substances, drawing on multiple sources of information to infer a consensus view of molecular networks. GeneWays is designed as an open platform, allowing researchers to query, review, and critique stored information.
Getting to the (C)Ore of Knowledge: Mining Biomedical Literature
- Int J Med Inform
"... Literature mining is the process of extracting and combining facts from scientific publications. In recent years, many computer programs have been designed to extract various molecular biology findings from Medline abstracts or full-text articles. The present article describes the range of text mini ..."
Abstract
-
Cited by 56 (1 self)
- Add to MetaCart
(Show Context)
Literature mining is the process of extracting and combining facts from scientific publications. In recent years, many computer programs have been designed to extract various molecular biology findings from Medline abstracts or full-text articles. The present article describes the range of text mining techniques that have been applied to scientific documents. It divides ‘automated reading ’ into four general subtasks: text categorization, named entity tagging, fact extraction, and collection-wide analysis. Literature mining offers powerful methods to support knowledge discovery and the construction of topic maps and ontologies. An overview is given of recent developments in medical language processing. Special attention is given to the domain particularities of molecular biology, and the emerging synergy between literature mining and molecular databases accessible through Internet.
A Shallow Parser Based on Closed-Class Words to Capture Relations in Biomedical Text
, 2003
"... Natural language processing for biomedical text currently focuses mostly on entity and relation extraction. These entities and relations are usually pre-specified entities, e.g., proteins, and pre-specified relations, e.g., inhibit relations. A shallow parser that captures the relations between noun ..."
Abstract
-
Cited by 51 (6 self)
- Add to MetaCart
Natural language processing for biomedical text currently focuses mostly on entity and relation extraction. These entities and relations are usually pre-specified entities, e.g., proteins, and pre-specified relations, e.g., inhibit relations. A shallow parser that captures the relations between noun phrases automatically from free text has been developed and evaluated. It uses heuristics and a noun phraser to capture entities of interest in the text. Cascaded finite state automata structure the relations between individual entities. The automata are based on closed-class English words and model generic relations not limited to specific words. The parser also recognizes coordinating conjunctions and captures negation in text, a feature usually ignored by others. Three cancer researchers evaluated 330 relations extracted from 26 abstracts of interest to them. There were 296 relations correctly extracted from the abstracts resulting in 90% precision of the relations and an average of 11 correct relations per abstract.
Ontology Learning from Text: An Overview
- In Paul Buitelaar, P., Cimiano, P., Magnini B. (Eds.), Ontology Learning from Text: Methods, Applications and Evaluation
, 2005
"... ..."
(Show Context)