See this document in CiteSeerX!

A Comparative Study on Feature Selection in Text Categorization (1997)  (Make Corrections)  (215 citations)
Yiming Yang, Jan O. Pedersen
Proceedings of ICML-97, 14th International Conference on Machine Learning



  Home/Search   Context   Related

Links:   ACM   DBLP

 
View or download:
cmu.edu/~yiming/papers.yy/ml97.ps
cmu.edu/~yiming/papers.y...icml97.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  cmu.edu/~yiming/publications (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: This paper is a comparative study of feature selection methods in statistical learning of text categorization. The focus is on aggressive dimensionality reduction. Five methods were evaluated, including term selection based on document frequency (DF), information gain (IG), mutual information (MI), a Ø 2 -test (CHI), and term strength (TS). We found IG and CHI most effective in our experiments. Using IG thresholding with a knearest neighbor classifier on the Reuters corpus, removal of up to... (Update)

Cited by:   More
Journal of Machine Learning Research 3 (2003) 1265-1287.. - Algorithm For Text   (Correct)
The Maximum-Margin Approach to Learning Text Classifiers -.. - Joachims (2000)   (Correct)
An Improved Boosting Algorithm and Its Application to Text.. - Sebastiani, al. (2000)   (Correct)

Active bibliography (related documents):   More   All
0.4:   Learning approaches for Detecting and Tracking News.. - Yang, Carbonell, Brown, .. (1999)   (Correct)
0.3:   An Evaluation of Statistical Approaches to Text Categorization - Yang (1997)   (Correct)
0.3:   A Comparison of Two Learning Algorithms for Text Categorization - Lewis, Ringuette (1994)   (Correct)

Similar documents based on text:   More   All
0.3:   A Loss Function Analysis for Classification Methods in Text.. - Li, Yang (2003)   (Correct)
0.3:   Modified Logistic Regression: An Approximation to SVM.. - Zhang, Jin, Yang.. (2003)   (Correct)
0.3:   A Scalability Analysis of Classifiers in Text Categorization - Yiming Yang Carnegie (2003)   (Correct)

Related documents from co-citation:   More   All
37:   Text categorization with Support Vector Machines: Learning with many relevant fe.. - Joachims - 1998
32:   An evaluation of statistical approaches to text categorization - Yang - 1999
24:   Machine Learning (context) - Mitchell - 1997

BibTeX entry:   (Update)

Yang, Y., Pedersen, J.O., A Comparative Study on Feature Selection in Text Categorization, Proc. of the 14th International Conference on Machine Learning ICML97, pp. 412---420, 1997. This article was processed using the L A T E X macro package with LLNCS style http://citeseer.ist.psu.edu/yang97comparative.html   More

@inproceedings{ yang97comparative,
    author = "Yiming Yang and Jan O. Pedersen",
    title = "A comparative study on feature selection in text categorization",
    booktitle = "Proceedings of {ICML}-97, 14th International Conference on Machine Learning",
    publisher = "Morgan Kaufmann Publishers, San Francisco, US",
    address = "Nashville, US",
    editor = "Douglas H. Fisher",
    pages = "412--420",
    year = "1997",
    url = "citeseer.ist.psu.edu/yang97comparative.html" }
Citations (may not include all citations):
568   Indexing by latent semantic analysis - Deerwester, Dumais et al. - 1990  DBLP
137   Accurate methods for the statistics of surprise and coincide.. - Dunning - 1993  ACM   DBLP
31   Transmission of Information (context) - Fano - 1961
27   Towards language independent automated learning of text cate.. - Apte, Damerau et al. - 1994  ACM
10   Trading mips and memory for knowledge engineering: classifyi.. (context) - Creecy, Masand et al. - 1992
5   mutual information and lexicography (context) - Church, Hanks et al. - 1989
4   Contextsensitive learning metods for text categorization (context) - Cohen, Singer - 1996
3   a rule-based multistage (context) - Fuhr, Hartmanna et al.



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cs.cmu.edu/~yiming/publications.html):   More
Topic Detection and Tracking Pilot Study - Allan, Carbonell, Doddington.. (1998)   (Correct)
Report on the CONALD Workshop on Learning from Text.. - Carbonell, Craven.. (1998)   (Correct)
Noise Reduction in a Statistical Approach to Text Categorization - Yang (1995)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC