(Enter summary)
Abstract: This paper is a comparative study of feature
selection methods in statistical learning of
text categorization. The focus is on aggressive
dimensionality reduction. Five methods
were evaluated, including term selection
based on document frequency (DF), information
gain (IG), mutual information (MI), a
Ø
2
-test (CHI), and term strength (TS). We
found IG and CHI most effective in our experiments.
Using IG thresholding with a knearest
neighbor classifier on the Reuters corpus,
removal of up to... (Update)
Cited by: More
Journal of Machine Learning Research 3 (2003) 1265-1287.. - Algorithm For Text
(Correct)
The Maximum-Margin Approach to Learning Text Classifiers -.. - Joachims (2000)
(Correct)
An Improved Boosting Algorithm and Its Application to Text.. - Sebastiani, al. (2000)
(Correct)
Active bibliography (related documents): More All
0.4: Learning approaches for Detecting and Tracking News.. - Yang, Carbonell, Brown, .. (1999)
(Correct)
0.3: An Evaluation of Statistical Approaches to Text Categorization - Yang (1997)
(Correct)
0.3: A Comparison of Two Learning Algorithms for Text Categorization - Lewis, Ringuette (1994)
(Correct)
Similar documents based on text: More All
0.3: A Loss Function Analysis for Classification Methods in Text.. - Li, Yang (2003)
(Correct)
0.3: Modified Logistic Regression: An Approximation to SVM.. - Zhang, Jin, Yang.. (2003)
(Correct)
0.3: A Scalability Analysis of Classifiers in Text Categorization - Yiming Yang Carnegie (2003)
(Correct)
Related documents from co-citation: More All
37: Text categorization with Support Vector Machines: Learning with many relevant fe..
- Joachims - 1998
32: An evaluation of statistical approaches to text categorization
- Yang - 1999
24: Machine Learning (context) - Mitchell - 1997
BibTeX entry: (Update)
Yang, Y., Pedersen, J.O., A Comparative Study on Feature Selection in Text Categorization, Proc. of the 14th International Conference on Machine Learning ICML97, pp. 412---420, 1997. This article was processed using the L A T E X macro package with LLNCS style http://citeseer.ist.psu.edu/yang97comparative.html More
@inproceedings{ yang97comparative,
author = "Yiming Yang and Jan O. Pedersen",
title = "A comparative study on feature selection in text categorization",
booktitle = "Proceedings of {ICML}-97, 14th International Conference on Machine Learning",
publisher = "Morgan Kaufmann Publishers, San Francisco, US",
address = "Nashville, US",
editor = "Douglas H. Fisher",
pages = "412--420",
year = "1997",
url = "citeseer.ist.psu.edu/yang97comparative.html" }
Citations (may not include all citations):
568
Indexing by latent semantic analysis
- Deerwester, Dumais et al. - 1990 DBLP
137
Accurate methods for the statistics of surprise and coincide..
- Dunning - 1993 ACM DBLP
31
Transmission of Information (context) - Fano - 1961
27
Towards language independent automated learning of text cate..
- Apte, Damerau et al. - 1994 ACM
10
Trading mips and memory for knowledge engineering: classifyi.. (context) - Creecy, Masand et al. - 1992
5
mutual information and lexicography (context) - Church, Hanks et al. - 1989
4
Contextsensitive learning metods for text categorization (context) - Cohen, Singer - 1996
3
a rule-based multistage (context) - Fuhr, Hartmanna et al.
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.cmu.edu/~yiming/publications.html): More
Topic Detection and Tracking Pilot Study - Allan, Carbonell, Doddington.. (1998)
(Correct)
Report on the CONALD Workshop on Learning from Text.. - Carbonell, Craven.. (1998)
(Correct)
Noise Reduction in a Statistical Approach to Text Categorization - Yang (1995)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC