MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

 

Download:
pdf
unknown authors
http://ranger.uta.edu/~alp/ix/readings/YangSigir99CategorizationBenchmark.pdf
Add To MetaCart

Abstract:

A re-examination of text categorization methods This paper reports a controlled study with statistical signi-cance tests on ve text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classi er, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a Naive Bayes (NB) classier. We focus on the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small (less than ten), and that all the methods perform comparably when the categories are su ciently common (over 300 instances). 1

Citations

982 Support-vector networks – Cortes, Vapnik - 1995
961 Text Categorization with Support Vector Machines – Joachims - 1997
328 An evaluation of statistical approaches to text categorization – Yang - 1999
289 Hierarchically classifying documents using very few words – Koller, Sahami - 1997
212 Training algorithms for linear text classifiers – Lewis, Schapire, et al. - 1996
211 A comparison of two learning algorithms for text categorization – Lewis, Ringuette - 1994
194 Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques – Dasarathy - 1991
188 Context-sensitive learning methods for text categorization – Cohen, SINGER - 1999
129 Sequenital minimal optimization: A fast algorithm for training support vector machines – Platt - 1998
115 Support vector machines: Training and applications – Osuna, Freund, et al. - 1997
111 A.S.Weigend. A neural network approach to topic spotting – Wiener - 1995
86 Feature selection, perceptron learning, and a usability case study for text categorization – Ng, Goh, et al. - 1997
76 An example-based mapping method for text categorization and retrieval – Yang, Chute - 1994
75 Classifying News Stories Using Memory Based Reasoning – Masand, Linoff, et al. - 1992
74 S.M.: Towards Language Independent Automated Learning of Text Categorization Models – Apt'e, Damerau, et al. - 1994
58 Feature selection in statistical learning of text categorization – Yang, Pedersen - 1997
57 Text categorization and relational learning – Cohen - 1995
56 CONSTRUE: A System for Content-Based Indexing of a Database of News Stories – Hayes, Weinstein - 1990
45 Using a generalized instance set for automatic text categorization – Lam, Ho - 1998
44 Air/X - A rule-based multi-stage indexing system for lage subject fields – Fuhr, Hartmanna, et al. - 1991
44 Cluster-Based Text Categorization: A Comparison of Category Search Strategies – Iwayama, Tokunaga - 1995
38 Automatic indexing based on bayesian inference networks – Tzeras, Hartmann - 1993
37 Text categorization: a symbolic approach – Moulinier, Ra˘skinis, et al. - 1996
21 Text mining with decision rules and decision trees – Apte, Damerau, et al. - 1998
13 The Nature of Statistical Learning Theory – Vapnic - 1995
12 Is learning bias an issue on the text categorization problem – Moulinier - 1997
10 Distributional clustering of words for text categorisation – Baker, McCallum - 1998
4 Expert network: E ective and e cient learning from human decisions in text categorization and retrieval – Yang - 1994
3 A comparison of event models for naivebayes text classi – McCallum, Nigam - 1998
1 Statistics: Theory and Methods. Brooks/Cole, Paci c – Berry, Lindgren - 1990
1 Sampling strategies and learning e ciency in text categorization – Yang - 1996