Results 1 - 10
of
14
Mining knowledge from text using information extraction
- SIGKDD Explorations
, 2005
"... An important approach to text mining involves the use of natural-language information extraction. Information extraction (IE) distills structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. IE systems can ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
An important approach to text mining involves the use of natural-language information extraction. Information extraction (IE) distills structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. IE systems can be used to directly extricate abstract knowledge from a text corpus, or to extract concrete data from a set of documents which can then be further analyzed with traditional data-mining techniques to discover more general patterns. We discuss methods and implemented systems for both of these approaches and summarize results on mining real text corpora of biomedical abstracts, job announcements, and product descriptions. We also discuss challenges that arise when employing current information extraction technology to discover knowledge in text.
Multi-field information extraction and cross-document fusion
- In ACL
, 2005
"... In this paper, we examine the task of extracting a set of biographic facts about target individuals from a collection of Web pages. We automatically annotate training text with positive and negative examples of fact extractions and train Rote, Naïve Bayes, and Conditional Random Field extraction mod ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
In this paper, we examine the task of extracting a set of biographic facts about target individuals from a collection of Web pages. We automatically annotate training text with positive and negative examples of fact extractions and train Rote, Naïve Bayes, and Conditional Random Field extraction models for fact extraction from individual Web pages. We then propose and evaluate methods for fusing the extracted information across documents to return a consensus answer. A novel cross-field bootstrapping method leverages data interdependencies to yield improved performance. 1
Text Mining through Semi Automatic Semantic Annotation
- Proc. of PAKM’2006
, 2006
"... The Web is the greatest information source in human history. ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
The Web is the greatest information source in human history.
Mining Knowledge from Text Collections Using Automatically Generated Metadata
- Proceedings of the Fourth International Conference on Practical Aspects of Knowledge Management (PAKM-2002
, 2002
"... Data mining is typically applied to large databases of highly structured information in order to discover new knowledge. In businesses and institutions, the amount of information existing in repositories of text documents usually rivals or surpasses the amount found in relational databases. Though t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Data mining is typically applied to large databases of highly structured information in order to discover new knowledge. In businesses and institutions, the amount of information existing in repositories of text documents usually rivals or surpasses the amount found in relational databases. Though the amount of potentially valuable knowledge contained in document collections can be great, they are often difficult to analyze. Therefore, it is important to develop methods to efficiently discover knowledge embedded in these document repositories. In this paper we describe an approach for mining knowledge from text collections by applying data mining techniques to metadata records generated via automated text categorization. By controlling the set of metadata fields as well as the set of assigned categories we can customize the knowledge discovery task to address specific questions. As an example, we apply the approach to a large collection of product reviews and evaluate the performance of the knowledge discovery.
Gene function prediction by mining biomedical literature
, 2004
"... The files are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The files are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address.
Statistical software
- Encyclopedia of Statistical Sciences
, 1988
"... an agile way to enhance ..."
Text Mining
- in a Digital Library. International Journal on Digital Libraries archive
"... metrics, entity extraction, hidden Markov models, hubs and authorities, information extraction, information retrieval, key-phrase assignment, key-phrase extraction, knowledge engineering, language identification, link analysis, machine learning, metadata, natural language processing, ngrams, rule le ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
metrics, entity extraction, hidden Markov models, hubs and authorities, information extraction, information retrieval, key-phrase assignment, key-phrase extraction, knowledge engineering, language identification, link analysis, machine learning, metadata, natural language processing, ngrams, rule learning, syntactic analysis, term frequency, text categorization, text mining, text
A Comparison of Two Document Clustering Approaches for Clustering Medical Documents
"... Abstract — Medical data is often presented as free text in the form of medical reports. Such documents contain important information about patients, disease progression and management, but are difficult to analyse with conventional data mining techniques due to their unstructured nature. Clustering ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — Medical data is often presented as free text in the form of medical reports. Such documents contain important information about patients, disease progression and management, but are difficult to analyse with conventional data mining techniques due to their unstructured nature. Clustering the medical documents into small number of meaningful clusters may facilitate discovering patterns by allowing us to extract a number of relevant features from each cluster, thus introducing structure into the data and facilitating the application of conventional data mining techniques. For this approach to work, it is essential to produce high-quality clustering. Thus, the main goals of this paper are (1) to experimentally evaluate the performance of six criterion functions in the context of partitional clustering approach, (2) to compare the clustering results of agglomerative approach and partitional approach for each of the criterion functions using real-world medical documents, and (3) to establish the right clustering algorithm to produce high quality clustering of real-world medical documents in order to discover hidden knowledge by analyzing the produced clusters. Our experimental results show that the clustering solutions produced by the agglomerative approach are consistently better than those produced by the partitional approach for all the criterion functions. Moreover, the results show that different criterion functions lead to substantially different results. In addition, we examine the quality of the features produced for each cluster for a classification task. The task involves discriminating between successful and unsuccessful procedures. The features extracted are used to produce an accurate classification of the data.
Automatic Thai Keyword Extraction from Categorized Text Corpus
"... Information Extraction (IE) is a process of discovering implicit and potentially important keywords underlying unstructured natural-language text corpus. Most previously proposed solutions to IE were accomplished by constructing a set of words from given text corpus during the preprocessing step. Du ..."
Abstract
- Add to MetaCart
Information Extraction (IE) is a process of discovering implicit and potentially important keywords underlying unstructured natural-language text corpus. Most previously proposed solutions to IE were accomplished by constructing a set of words from given text corpus during the preprocessing step. Due to the inherent chracteristic of Thai written language which does not explicitly use any word delimiting characters, identifying individual words, i.e., word segmentation, is a challenging task and has become one of the important research topics in Natural Language Processing (NLP). In this paper, an alternative method to word segmentation for extracting important keywords from categorized text corpus is proposed. The approach is based on the analysis of frequent substring-sets, as a result, this method is language-independent, i.e., does not rely on the use of any dictionary or language grammatical knowledge. We refer to this method as Automatic Categorized Keyword Extraction (ACKE). Applying the ACKE algorithm to a text corpus yields sets of keywords which are highly distinct between different categories from the given text corpus.
Named Entity Learning and Verification:
, 2002
"... The regularity of named entities is used to learn names and to extract named entities. Having only a few name elements and a set of patterns the a lgorithm learns new names and its elements. A verification step assures quality using a large background corpus. Further improvement is reached thr ..."
Abstract
- Add to MetaCart
The regularity of named entities is used to learn names and to extract named entities. Having only a few name elements and a set of patterns the a lgorithm learns new names and its elements. A verification step assures quality using a large background corpus. Further improvement is reached through classifying the newly learnt elements on character level. Moreover, unsupervised rule learning is discussed.

