A Comparison of Two Learning Algorithms for Text Categorization (1994)
| Venue: | In Third Annual Symposium on Document Analysis and Information Retrieval |
| Citations: | 239 - 1 self |
BibTeX
@INPROCEEDINGS{Lewis94acomparison,
author = {David D. Lewis and Marc Ringuette},
title = {A Comparison of Two Learning Algorithms for Text Categorization},
booktitle = {In Third Annual Symposium on Document Analysis and Information Retrieval},
year = {1994},
pages = {81--93}
}
Years of Citing Articles
OpenURL
Abstract
This paper examines the use of inductive learning to categorize natural language documents into predefined content categories. Categorization of text is of increasing importance in information retrieval and natural language processing systems. Previous research on automated text categorization has mixed machine learning and knowledge engineering methods, making it difficult to draw conclusions about the performance of particular methods. In this paper we present empirical results on the performance of a Bayesian classifier and a decision tree learning algorithm on two text categorization data sets. We find that both algorithms achieve reasonable performance and allow controlled tradeoffs between false positives and false negatives. The stepwise feature selection in the decision tree algorithm is particularly effective in dealing with the large feature sets common in text categorization. However, even this algorithm is aided by an initial prefiltering of features, confirming the results...







