Results 1 - 10
of
13
A fast rule-based approach for biomedical event extraction
"... In this paper we present a biomedical event extraction system for the BioNLP 2013 event extraction task. Our system consists of two phases. In the learning phase, a dictionary and patterns are generated automatically from annotated events. In the extraction phase, the dictionary and obtained pattern ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
In this paper we present a biomedical event extraction system for the BioNLP 2013 event extraction task. Our system consists of two phases. In the learning phase, a dictionary and patterns are generated automatically from annotated events. In the extraction phase, the dictionary and obtained patterns are applied to extract events from input text. When evaluated on the GENIA event extraction task of the BioNLP 2013 shared task, the system obtained the best results on strict matching and the third best on approximate span and recursive matching, with F-scores of 48.92 and 50.68, respectively. Moreover, it has excellent performance in terms of speed. 1
A Hybrid Approach for Biomedical Event Extraction
"... In this paper we propose a system which uses hybrid methods that combine both rule-based and machine learning (ML)-based approaches to solve GENIA Event Extraction of BioNLP Shared Task 2013. We apply UIMA 1 Framework to support coding. There are three main stages in model: Pre-processing, trigger d ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
In this paper we propose a system which uses hybrid methods that combine both rule-based and machine learning (ML)-based approaches to solve GENIA Event Extraction of BioNLP Shared Task 2013. We apply UIMA 1 Framework to support coding. There are three main stages in model: Pre-processing, trigger detection and biomedical event detection. We use dictionary and support vector machine classifier to detect event triggers. Event detection is applied on syntactic patterns which are combined with features extracted for classification. 1
A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set
"... The information extraction from unstructured text segments is a complex task. Although manual information extraction often produces the best results, it is harder to manage biomedical data extraction manually because of the exponential increase in data size. Thus, there is a need for automatic tool ..."
Abstract
- Add to MetaCart
The information extraction from unstructured text segments is a complex task. Although manual information extraction often produces the best results, it is harder to manage biomedical data extraction manually because of the exponential increase in data size. Thus, there is a need for automatic tools and techniques for information extraction in biomedical text mining. Relation extraction is a significant area under biomedical information extraction that has gained much importance in the last two decades. A lot of work has been done on biomedical relation extraction focusing on rule-based and machine learning techniques. In the last decade, the focus has changed to hybrid approaches showing better results. This research presents a hybrid feature set for classification of relations between biomedical entities. The main contribution of this research is done in the semantic feature set where verb phrases are ranked using Unified Medical Language System (UMLS) and a ranking algorithm. Support Vector Machine and Naïve Bayes, the two effective machine learning techniques, are used to classify these relations. Our approach has been validated on the standard biomedical text corpus obtained from MEDLINE 2001. Conclusively, it can be articulated that our framework outperforms all state-of-the-art approaches used for relation extraction on the same corpus.
An Overview of Biomolecular Event Extraction from Scientific Documents
"... This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuab ..."
Abstract
- Add to MetaCart
(Show Context)
This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed.
RESEARCH ARTICLE Open Access
"... Extracting semantically enriched events from biomedical literature ..."
(Show Context)
RESEARCH ARTICLE Open Access
"... A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems Yifan Peng1*, Manabu Torii1,2, Cathy H Wu1,2 and K Vijay-Shanker1 Background: Text mining is increasingly used in the biomedical domain because of its ability to automatically gather informatio ..."
Abstract
- Add to MetaCart
(Show Context)
A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems Yifan Peng1*, Manabu Torii1,2, Cathy H Wu1,2 and K Vijay-Shanker1 Background: Text mining is increasingly used in the biomedical domain because of its ability to automatically gather information from large amount of scientific articles. One important task in biomedical text mining is relation extraction, which aims to identify designated relations among biological entities reported in literature. A relation extraction system achieving high performance is expensive to develop because of the substantial time and effort required for its design and implementation. Here, we report a novel framework to facilitate the development of a pattern-based biomedical relation extraction system. It has several unique design features: (1) leveraging syntactic variations possible in a language and automatically generating extraction patterns in a systematic manner, (2) applying sentence simplification to improve the coverage of extraction patterns, and (3) identifying referential relations between a syntactic argument of a predicate and the actual target expected in the relation extraction task. Results: A relation extraction system derived using the proposed framework achieved overall F-scores of 72.66 % for
PROCEEDINGS Open Access University of Turku in the BioNLP’11 Shared Task
"... Background: We present a system for extracting biomedical events (detailed descriptions of biomolecular interactions) from research articles, developed for the BioNLP’11 Shared Task. Our goal is to develop a system easily adaptable to different event schemes, following the theme of the BioNLP’11 Sha ..."
Abstract
- Add to MetaCart
(Show Context)
Background: We present a system for extracting biomedical events (detailed descriptions of biomolecular interactions) from research articles, developed for the BioNLP’11 Shared Task. Our goal is to develop a system easily adaptable to different event schemes, following the theme of the BioNLP’11 Shared Task: generalization, the extension of event extraction to varied biomedical domains. Our system extends our BioNLP’09 Shared Task winning Turku Event Extraction System, which uses support vector machines to first detect event-defining words, followed by detection of their relationships. Results: Our current system successfully predicts events for every domain case introduced in the BioNLP’11 Shared Task, being the only system to participate in all eight tasks and all of their subtasks, with best performance in four tasks. Following the Shared Task, we improve the system on the Infectious Diseases task from 42.57 % to 53.87 % F-score, bringing performance into line with the similar GENIA Event Extraction and Epigenetics and Post-translational Modifications tasks. We evaluate the machine learning performance of the system by calculating learning curves for all tasks, detecting areas where additional annotated data could be used to improve performance. Finally, we evaluate the use of system output on external articles as additional training data in a form of self-training. Conclusions: We show that the updated Turku Event Extraction System can easily be adapted to all presently
PROCEEDINGS Open Access The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011
"... Background: The Genia task, when it was introduced in 2009, was the first community-wide effort to address a fine-grained, structural information extraction from biomedical literature. Arranged for the second time as one of the main tasks of BioNLP Shared Task 2011, it aimed to measure the progress ..."
Abstract
- Add to MetaCart
(Show Context)
Background: The Genia task, when it was introduced in 2009, was the first community-wide effort to address a fine-grained, structural information extraction from biomedical literature. Arranged for the second time as one of the main tasks of BioNLP Shared Task 2011, it aimed to measure the progress of the community since 2009, and to evaluate generalization of the technology to full text papers. The Protein Coreference task was arranged as one of the supporting tasks, motivated from one of the lessons of the 2009 task that the abundance of coreference structures in natural language text hinders further improvement with the Genia task. Results: The Genia task received final submissions from 15 teams. The results show that the community has made a significant progress, marking 74 % of the best F-score in extracting bio-molecular events of simple structure, e.g., gene expressions, and 45 % ~ 48 % in extracting those of complex structure, e.g., regulations. The Protein Coreference task received 6 final submissions. The results show that the coreference resolution performance in biomedical domain is lagging behind that in newswire domain, cf. 50 % vs. 66 % in MUC score. Particularly, in terms of protein coreference resolution the best system achieved 34 % in F-score. Conclusions: Detailed analysis performed on the results improves our insight into the problem and suggests the directions for further improvements.
PROCEEDINGS Open Access Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011
"... We present the preparation, resources, results and analysis of three tasks of the BioNLP Shared Task 2011: the main tasks on Infectious Diseases (ID) and Epigenetics and Post-translational Modifications (EPI), and the supporting task on Entity Relations (REL). The two main tasks represent extensions ..."
Abstract
- Add to MetaCart
(Show Context)
We present the preparation, resources, results and analysis of three tasks of the BioNLP Shared Task 2011: the main tasks on Infectious Diseases (ID) and Epigenetics and Post-translational Modifications (EPI), and the supporting task on Entity Relations (REL). The two main tasks represent extensions of the event extraction model introduced in the BioNLP Shared Task 2009 (ST’09) to two new areas of biomedical scientific literature, each motivated by the needs of specific biocuration tasks. The ID task concerns the molecular mechanisms of infection, virulence and resistance, focusing in particular on the functions of a class of signaling systems that are ubiquitous in bacteria. The EPI task is dedicated to the extraction of statements regarding chemical modifications of DNA and proteins, with particular emphasis on changes relating to the epigenetic control of gene expression. By contrast to these two application-oriented main tasks, the REL task seeks to support extraction in general by separating challenges relating to part-of relations into a subproblem that can be addressed by independent systems. Seven groups participated in each of the two main tasks and four groups in the supporting task. The participating systems indicated advances in the capability of event extraction methods and demonstrated generalization in many aspects: from abstracts to full texts, from previously considered subdomains to new ones, and from the ST’09 extraction targets to other entities and events. The highest performance achieved in the supporting task REL, 58 % F-score, is broadly comparable with