Results 1 - 10
of
10
A Framework for Schema-Driven Relationship Discovery from Unstructured text
- Proc. 5th International Semantic Web Conference
, 2006
"... Abstract. We address the issue of extracting implicit and explicit relationships between entities in biomedical text. We argue that entities seldom occur in text in their simple form and that relationships in text relate the modified, complex forms of entities with each other. We present a rule-base ..."
Abstract
-
Cited by 27 (9 self)
- Add to MetaCart
(Show Context)
Abstract. We address the issue of extracting implicit and explicit relationships between entities in biomedical text. We argue that entities seldom occur in text in their simple form and that relationships in text relate the modified, complex forms of entities with each other. We present a rule-based method for (1) extraction of such complex entities and (2) relationships between them and (3) the conversion of such relationships into RDF. Furthermore, we present results that clearly demonstrate the utility of the generated RDF in discovering knowledge from text corpora by means of locating paths composed of the extracted relationships. Keywords: Relationship Extraction, Knowledge-Driven Text mining 1
The AMTEx Approach in the Medical Document Indexing and Retrieval Application
"... AMTEx is a medical document indexing method, specifically designed for the automatic indexing of documents in large medical collections, such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTEx combines MeSH, the terminological thesaurus resource of N ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
(Show Context)
AMTEx is a medical document indexing method, specifically designed for the automatic indexing of documents in large medical collections, such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTEx combines MeSH, the terminological thesaurus resource of NLM, with a wellestablished method for extraction of terminology, the C/NC-value method. The performance evaluation of two AMTEx configurations is measured against the current state-of-the-art, the MetaMap Transfer (MMTx) method in four experiments, using two types of corpora: a subset of MEDLINE (PMC) full document corpus and a subset of MEDLINE (OHSUMED) abstracts, for each of the indexing and retrieval tasks respectively. The experimental results demonstrate that AMTEx performs better in indexing in 20-50 % of the processing time compared to MMTx, while for the retrieval task, AMTEx performs better in the full text (PMC) corpus.
Medical Document Indexing and Retrieval: AMTEx vs. NLM MMTx
"... AMTEx is a medical document indexing method, specifically designed for the automatic indexing of documents in large medical collections, such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTEx combines MeSH, the terminological thesaurus resource of N ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
AMTEx is a medical document indexing method, specifically designed for the automatic indexing of documents in large medical collections, such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTEx combines MeSH, the terminological thesaurus resource of NLM, with a wellestablished method for term extraction, the C/NC-value method. The performance evaluation of two AMTEx configurations is measured against the current state-of-theart, the MMTx method in indexing and retrieval tasks in three experiments. In the first, a subset of MEDLINE (PMC) full document corpus was used for the indexing task. In the second and third, a subset of MEDLINE (OHSUMED) abstracts was used for indexing and retrieval respectively. The experimental results demonstrate that AMTEx achieves better precision in all tasks, in 50-20 % of the processing time compared to MMTx.
KNOWLEDGE MANAGEMENT, DATA MINING, AND TEXT MINING IN MEDICAL INFORMATICS
"... In this chapter we provide a broad overview of selected knowledge management, data mining, and text mining techniques and their use in various emerging biomedical applications. It aims to set the context for subsequent chapters. We first introduce five major paradigms for machine learning and data a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
In this chapter we provide a broad overview of selected knowledge management, data mining, and text mining techniques and their use in various emerging biomedical applications. It aims to set the context for subsequent chapters. We first introduce five major paradigms for machine learning and data analysis including: probabilistic and statistical models, symbolic learning and rule induction, neural networks, evolution-based algorithms, and analytic learning and fuzzy logic. We also discuss their relevance and potential for biomedical research. Example applications of relevant knowledge management, data mining, and text mining research are then reviewed in order including: ontologies; knowledge management for health care, biomedical literature, heterogeneous databases, information visualization, and multimedia databases; and data and text mining for health care, literature, and biological data. We conclude the paper with discussions of privacy and confidentiality issues of relevance to biomedical data mining.
Using MEDLINE as a Knowledge Source for Disambiguating Abbreviations in Full-Text Biomedical Journal Articles
"... Biomedical abbreviations and acronyms are widely used in biomedical literature. Since many abbreviations represent important content in biomedical literature, information retrieval and extraction benefits from identifying the meanings of biomedical abbreviations. Since many abbreviations are ambiguo ..."
Abstract
- Add to MetaCart
(Show Context)
Biomedical abbreviations and acronyms are widely used in biomedical literature. Since many abbreviations represent important content in biomedical literature, information retrieval and extraction benefits from identifying the meanings of biomedical abbreviations. Since many abbreviations are ambiguous, it would be important to map abbreviations to their full forms, which ultimately represent the meanings of the abbreviations. In this study, we present a novel unsupervised method that applies MEDLINE records as a knowledge source for disambiguating abbreviations in full-text biomedical journal articles. We first automatically generated from MEDLINE records a knowledge source or dictionary of abbreviation-full pairs. We then trained on MEDLINE records and predicted the full forms of abbreviations in full-text journal articles by applying supervised machine-learning algorithms in an unsupervised fashion. We report up to 92 % prediction precision and up to 91 % coverage. Keywords: Disambiguation, terminology, information retrieval, information extraction, machine learning
A Framework for Schema-Driven Relationship Discovery from Unstructured text
"... Abstract. We address the issue of extracting implicit and explicit relationships between entities in biomedical text. We argue that entities seldom occur in text in their simple form and that relationships in text relate the modified, complex forms of entities with each other. We present a rule-base ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We address the issue of extracting implicit and explicit relationships between entities in biomedical text. We argue that entities seldom occur in text in their simple form and that relationships in text relate the modified, complex forms of entities with each other. We present a rule-based method for (1) extraction of such complex entities and (2) relationships between them and (3) the conversion of such relationships into RDF. Furthermore, we present results that clearly demonstrate the utility of the generated RDF in discovering knowledge from text corpora by means of locating paths composed of the extracted relationships.
From Biomedical Literature to Knowledge: Mining Protein-Protein Interactions
"... Summary. To date, more than 16 million citations of published articles in biomedical domain are available in the MEDLINE database. These articles describe the new discoveries which accompany a tremendous development in biomedicine during the last decade. It is crucial for biomedical researchers to r ..."
Abstract
- Add to MetaCart
(Show Context)
Summary. To date, more than 16 million citations of published articles in biomedical domain are available in the MEDLINE database. These articles describe the new discoveries which accompany a tremendous development in biomedicine during the last decade. It is crucial for biomedical researchers to retrieve and mine some specific knowledge from the huge quantity of published articles with high efficiency. Researchers have been engaged in the development of text mining tools to find knowledge such as protein-protein interactions, which are most relevant and useful for specific analysis tasks. This chapter provides a road map to the various information extraction methods in biomedical domain, such as protein name recognition and discovery of protein-protein interactions. Disciplines involved in analyzing and processing unstructured-text are summarized. Current work in biomedical information extracting is categorized. Current challenges in the field are also presented and possible solutions are discussed. 1
Methodological Review Term Identification in the Biomedical Literature *
"... Sophisticated information technologies are needed for effective data acquisition and integration from a growing body of the biomedical literature. Successful term identification is key to getting access to the stored literature information, as it is the terms (and their relationships) that convey kn ..."
Abstract
- Add to MetaCart
(Show Context)
Sophisticated information technologies are needed for effective data acquisition and integration from a growing body of the biomedical literature. Successful term identification is key to getting access to the stored literature information, as it is the terms (and their relationships) that convey knowledge across scientific articles. Due to the complexities of a dynamically changing biomedical terminology, term identification has been recognized as the current bottleneck in text mining, and – as a consequence – has become an important research topic both in natural language processing and biomedical communities. This article overviews state-of-the-art approaches in term identification. The process of identifying terms is analysed through three steps: term recognition, term classification and term mapping. For each step, main approaches and general trends, along with the major problems, are discussed. By assessing previous work in context of the overall term identification process, the review also tries to delineate needs for future work in the field.
H.3.1 [Content Analysis and Indexing
"... In this paper we define Semantic Trails as a series of simple and complex relationships connecting entities in documents across a corpus. We propose a novel algorithm that uses a dependency parse of sentences in documents to extract these simple and complex relationships. The extracted relationships ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper we define Semantic Trails as a series of simple and complex relationships connecting entities in documents across a corpus. We propose a novel algorithm that uses a dependency parse of sentences in documents to extract these simple and complex relationships. The extracted relationships are represented in RDF. We evaluate the RDF that is extracted and present an analysis of the errors, their causes and possible future enhancements. Following extraction, we describe how this extracted RDF is superimposed back onto the original text to realize what we term as Semantic Trails through text. Using the TREC 2006 Genomics Track data as a use case we demonstrate how the result of our improved extraction mechanism can be used to build a Semantic Browser that supports the creation and use of Semantic Trails.