Results 1 - 10
of
12
Overview of the Cancer Genetics (CG) task of BioNLP Shared Task 2013
"... We present the design, preparation, results and analysis of the Cancer Genetics (CG) event extraction task, a main task of the BioNLP Shared Task (ST) 2013. The CG task is an information extraction task targeting the recognition of events in text, represented as structured n-ary associations of give ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We present the design, preparation, results and analysis of the Cancer Genetics (CG) event extraction task, a main task of the BioNLP Shared Task (ST) 2013. The CG task is an information extraction task targeting the recognition of events in text, represented as structured n-ary associations of given physical entities. In addition to addressing the cancer domain, the CG task is differentiated from previous event extraction tasks in the BioNLP ST series in addressing a wide range of pathological processes and multiple levels of biological organization, ranging from the molecular through the cellular and organ levels up to whole organisms. Final test set submissions were accepted from six teams. The highest-performing system achieved an F-score of 55.4%. This level of performance is broadly comparable with the state of the art for established molecular-level extraction tasks, demonstrating that event extraction resources and methods generalize well to higher levels of biological organization and are applicable to the analysis of scientific texts on cancer. The CG task continues as an open challenge to all interested parties, with tools and resources available from
Identification of Genia Events using Multiple Classifiers
"... We describe our system to extract genia events that was developed for the BioNLP 2013 Shared Task. Our system uses a supervised information extraction platform based on Support Vector Machines (SVM) and separates the process of event classification into multiple stages. For each event type the SVM p ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
We describe our system to extract genia events that was developed for the BioNLP 2013 Shared Task. Our system uses a supervised information extraction platform based on Support Vector Machines (SVM) and separates the process of event classification into multiple stages. For each event type the SVM parameters are adjusted and feature selection carried out. We find that this optimisation improves the performance of our approach. Overall our system achieved the highest precision score of all systems and was ranked 6th of 10 participating systems on F-measure (strict matching). 1
Relieving the Computational Bottleneck: Joint Inference for Event Extraction with High-Dimensional Features
"... Several state-of-the-art event extraction sys-tems employ models based on Support Vec-tor Machines (SVMs) in a pipeline architec-ture, which fails to exploit the joint depen-dencies that typically exist among events and arguments. While there have been at-tempts to overcome this limitation using Mar ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Several state-of-the-art event extraction sys-tems employ models based on Support Vec-tor Machines (SVMs) in a pipeline architec-ture, which fails to exploit the joint depen-dencies that typically exist among events and arguments. While there have been at-tempts to overcome this limitation using Markov Logic Networks (MLNs), it re-mains challenging to perform joint infer-ence in MLNs when the model encodes many high-dimensional sophisticated fea-tures such as those essential for event ex-traction. In this paper, we propose a new model for event extraction that combines the power of MLNs and SVMs, dwarfing their limitations. The key idea is to reli-ably learn and process high-dimensional features using SVMs; encode the output of SVMs as low-dimensional, soft formu-las in MLNs; and use the superior joint in-ferencing power of MLNs to enforce joint consistency constraints over the soft for-mulas. We evaluate our approach for the task of extracting biomedical events on the BioNLP 2013, 2011 and 2009 Genia shared task datasets. Our approach yields the best F1 score to date on the BioNLP’13 (53.61) and BioNLP’11 (58.07) datasets and the second-best F1 score to date on the BioNLP’09 dataset (58.16). 1
morphological processing of biomedical text
"... available at the end of the article Background: The wide variety of morphological variants of domain-specific technical terms contributes to the complexity of performing natural language processing of the scientific literature related to molecular biology. For morphological analysis of these texts, ..."
Abstract
- Add to MetaCart
(Show Context)
available at the end of the article Background: The wide variety of morphological variants of domain-specific technical terms contributes to the complexity of performing natural language processing of the scientific literature related to molecular biology. For morphological analysis of these texts, lemmatization has been actively applied in the recent biomedical research. Results: In this work, we developed a domain-specific lemmatization tool, BioLemmatizer, for the morphological analysis of biomedical literature. The tool focuses on the inflectional morphology of English and is based on the general English lemmatization tool MorphAdorner. The BioLemmatizer is further tailored to the biological domain through incorporation of several published lexical resources. It retrieves lemmas based on the use of a word lexicon, and defines a set of rules that transform a word to a lemma if it is not encountered in the lexicon. An innovative aspect of the BioLemmatizer is the use of a hierarchical strategy for searching the lexicon, which enables the discovery of the correct lemma even if the input Part-of-Speech information is inaccurate. The BioLemmatizer achieves an accuracy of 97.5% in lemmatizing an evaluation set prepared from the CRAFT corpus, a collection of full-text biomedical articles, and an accuracy of 97.6 % on the LLL05 corpus. The contribution of the BioLemmatizer to accuracy improvement of a practical information extraction task is further demonstrated when it is used as a component in a biomedical text mining system. Conclusions: The BioLemmatizer outperforms other tools when compared with eight existing lemmatizers. The BioLemmatizer is released as an open source software and can be downloaded from
BioNLP Shared Task 2013: Supporting Resources
"... This paper describes the technical contribution of the supporting resources provided for the BioNLP Shared Task 2013. Following the tradition of the previous two BioNLP Shared Task events, the task organisers and several external groups sought to make system development easier for the task participa ..."
Abstract
- Add to MetaCart
(Show Context)
This paper describes the technical contribution of the supporting resources provided for the BioNLP Shared Task 2013. Following the tradition of the previous two BioNLP Shared Task events, the task organisers and several external groups sought to make system development easier for the task participants by providing automatically generated analyses using a variety of automated tools. Providing analyses created by different tools that address the same task also enables extrinsic evaluation of the tools through the evaluation of their contributions to the event extraction task. Such evaluation can improve understanding of the applicability and benefits of specific tools and representations. The supporting resources described in this paper will continue to be publicly available from the shared task homepage
Review Article Biomedical Relation Extraction: From Binary to Complex
"... which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Biomedical relation extraction aims to uncover high-quality relations from life science literature with high accuracy and efficiency. Early biomedical relation extraction tasks ..."
Abstract
- Add to MetaCart
(Show Context)
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Biomedical relation extraction aims to uncover high-quality relations from life science literature with high accuracy and efficiency. Early biomedical relation extraction tasks focused on capturing binary relations, such as protein-protein interactions, which are crucial for virtually every process in a living cell. Information about these interactions provides the foundations for new therapeutic approaches. In recent years, more interests have been shifted to the extraction of complex relations such as biomolecular events. While complex relations go beyond binary relations and involve more than two arguments, they might also take another relation as an argument. In the paper, we conduct a thorough survey on the research in biomedical relation extraction. We first present a general framework for biomedical relation extraction and then discuss the approaches proposed for binary and complex relation extraction with focus on the latter since it is a much more difficult task compared to binary relation extraction. Finally, we discuss challenges that we are facing with complex relation extraction and outline possible solutions and future directions. 1.
PROCEEDINGS Open Access University of Turku in the BioNLP’11 Shared Task
"... Background: We present a system for extracting biomedical events (detailed descriptions of biomolecular interactions) from research articles, developed for the BioNLP’11 Shared Task. Our goal is to develop a system easily adaptable to different event schemes, following the theme of the BioNLP’11 Sha ..."
Abstract
- Add to MetaCart
(Show Context)
Background: We present a system for extracting biomedical events (detailed descriptions of biomolecular interactions) from research articles, developed for the BioNLP’11 Shared Task. Our goal is to develop a system easily adaptable to different event schemes, following the theme of the BioNLP’11 Shared Task: generalization, the extension of event extraction to varied biomedical domains. Our system extends our BioNLP’09 Shared Task winning Turku Event Extraction System, which uses support vector machines to first detect event-defining words, followed by detection of their relationships. Results: Our current system successfully predicts events for every domain case introduced in the BioNLP’11 Shared Task, being the only system to participate in all eight tasks and all of their subtasks, with best performance in four tasks. Following the Shared Task, we improve the system on the Infectious Diseases task from 42.57 % to 53.87 % F-score, bringing performance into line with the similar GENIA Event Extraction and Epigenetics and Post-translational Modifications tasks. We evaluate the machine learning performance of the system by calculating learning curves for all tasks, detecting areas where additional annotated data could be used to improve performance. Finally, we evaluate the use of system output on external articles as additional training data in a form of self-training. Conclusions: We show that the updated Turku Event Extraction System can easily be adapted to all presently
PROCEEDINGS Open Access The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011
"... Background: The Genia task, when it was introduced in 2009, was the first community-wide effort to address a fine-grained, structural information extraction from biomedical literature. Arranged for the second time as one of the main tasks of BioNLP Shared Task 2011, it aimed to measure the progress ..."
Abstract
- Add to MetaCart
(Show Context)
Background: The Genia task, when it was introduced in 2009, was the first community-wide effort to address a fine-grained, structural information extraction from biomedical literature. Arranged for the second time as one of the main tasks of BioNLP Shared Task 2011, it aimed to measure the progress of the community since 2009, and to evaluate generalization of the technology to full text papers. The Protein Coreference task was arranged as one of the supporting tasks, motivated from one of the lessons of the 2009 task that the abundance of coreference structures in natural language text hinders further improvement with the Genia task. Results: The Genia task received final submissions from 15 teams. The results show that the community has made a significant progress, marking 74 % of the best F-score in extracting bio-molecular events of simple structure, e.g., gene expressions, and 45 % ~ 48 % in extracting those of complex structure, e.g., regulations. The Protein Coreference task received 6 final submissions. The results show that the coreference resolution performance in biomedical domain is lagging behind that in newswire domain, cf. 50 % vs. 66 % in MUC score. Particularly, in terms of protein coreference resolution the best system achieved 34 % in F-score. Conclusions: Detailed analysis performed on the results improves our insight into the problem and suggests the directions for further improvements.
PROCEEDINGS Open Access Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011
"... We present the preparation, resources, results and analysis of three tasks of the BioNLP Shared Task 2011: the main tasks on Infectious Diseases (ID) and Epigenetics and Post-translational Modifications (EPI), and the supporting task on Entity Relations (REL). The two main tasks represent extensions ..."
Abstract
- Add to MetaCart
(Show Context)
We present the preparation, resources, results and analysis of three tasks of the BioNLP Shared Task 2011: the main tasks on Infectious Diseases (ID) and Epigenetics and Post-translational Modifications (EPI), and the supporting task on Entity Relations (REL). The two main tasks represent extensions of the event extraction model introduced in the BioNLP Shared Task 2009 (ST’09) to two new areas of biomedical scientific literature, each motivated by the needs of specific biocuration tasks. The ID task concerns the molecular mechanisms of infection, virulence and resistance, focusing in particular on the functions of a class of signaling systems that are ubiquitous in bacteria. The EPI task is dedicated to the extraction of statements regarding chemical modifications of DNA and proteins, with particular emphasis on changes relating to the epigenetic control of gene expression. By contrast to these two application-oriented main tasks, the REL task seeks to support extraction in general by separating challenges relating to part-of relations into a subproblem that can be addressed by independent systems. Seven groups participated in each of the two main tasks and four groups in the supporting task. The participating systems indicated advances in the capability of event extraction methods and demonstrated generalization in many aspects: from abstracts to full texts, from previously considered subdomains to new ones, and from the ST’09 extraction targets to other entities and events. The highest performance achieved in the supporting task REL, 58 % F-score, is broadly comparable with