See this document in CiteSeerX!

Journal of Machine Learning Research 3 (2003) 1265-1287 Submitted 5/02; Published 3/03 A Divisive Information-Theoretic Feature Clustering  (Make Corrections)  
Algorithm for Text Classification Inderjit S. Dhillon



  Home/Search   Context   Related

 
View or download:
ai.informatik.uni...lon_etal_2003a.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  ai.informatik.unidor...DOKUMENTE (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: High dimensionality of text can be a deterrent in applying complex learners such as Support Vector Machines to the task of text classification. Feature clustering is a powerful alternative to feature selection for reducing the dimensionality of text data. In this paper we propose a new informationtheoretic divisive algorithm for feature/word clustering and apply it to text classification. Existing techniques for such "distributional clustering" of words are agglomerative in nature and... (Update)

Similar documents (at the sentence level):
20.8%:   A Divisive Information-Theoretic Feature Clustering.. - Dhillon, Mallela, Kumar (2003)   (Correct)
15.2%:   Information Theoretic Feature Clustering for Text.. - Dhillon, Manella, Kumar   (Correct)
12.2%:   Enhanced Word Clustering for Hierarchical Text Classification - Dhillon, Mallela, Kumar (2002)   (Correct)

Active bibliography (related documents):   More   All
0.3:   On the Use of Sparse Signal Decomposition in the Analysis of.. - Theis, Garcia   (Correct)
0.2:   Survey Of Clustering Data Mining Techniques - Berkhin (2002)   (Correct)
0.1:   Information Theoretic Clustering of Sparse Co-Occurrence Data - Inderjit Dhillon And (2003)   (Correct)

Similar documents based on text:
0.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

@misc{ text-journal,
  author = "Algorithm For Text",
  title = "Journal of Machine Learning Research 3 (2003) 1265-1287 Submitted 5/02;
    Published 3/03 A Divisive Information-Theoretic Feature Clustering",
  url = "citeseer.ist.psu.edu/766551.html" }
Citations (may not include all citations):
2319   Elements of Information Theory (context) - Cover, Thomas - 1991
1447   A mathematical theory of communication (context) - Shannon - 1948
1291   The Nature of Statistical Learning Theory (context) - Vapnik - 1995
976   Machine Learning (context) - Mitchell - 1997
568   Indexing by Latent Semantic Analysis - Deerwester, Dumais et al. - 1990
417   Stochastic Complexity in Statistical Inquiry (context) - Rissanen - 1989
376   Text categorization with support vector machines: learning w.. - Joachims - 1998
255   A training algorithm for optimal margin classifiers - Boser, Guyon et al. - 1992
215   A comparative study on feature selection in text categorizat.. - Yang, Pedersen - 1997
166   A re-examination of text categorization methods - Yang, Liu - 1999
140   A comparison of event models for naive bayes text classifica.. - McCallum, Nigam - 1998
135   Hierarchically classifying documents using very few words - Koller, Sahami - 1997
128   the optimality of the simple Bayesian classifier under zero-.. - Domingos, Pazzani - 1997
123   Probabilistic latent semantic indexing - Hofmann - 1999
119   What every computer scientist should know about floating poi.. - Goldberg - 1991
116   On information and sufficiency (context) - Kullback, Leibler - 1951
106   The information bottleneck method - Tishby, Pereira et al. - 1999
72   Bow: A toolkit for statistical language modeling (context) - McCallum - 1996
65   Divergence measures based on the Shannon entropy (context) - Lin - 1991
58   Distributional clustering of words for text classification - Baker, McCallum - 1998
27   The complexity of the generalized Lloyd-Max problem (context) - Garey, Johnson et al. - 1982
23   and signatures for navigating in text databases (context) - Chakrabarti, Dom et al. - 1997
22   Introduction to Modern Retrieval (context) - Salton, McGill - 1983
21   News Weeder: Learning to filter netnews (context) - Lang - 1995
15   On feature distributional clustering for text categorization - Bekkerman, El-Yaniv et al. - 2001
13   Model selection in unsupervised learning with applications t.. (context) - Vaithyanathan, Dom - 1999
12   The power of word clusters for text classification - Slonim, Tishby - 2001
9   Learning simple relations: Theory and applications - Berkhin, Becher - 2002
7   Iterative clustering of high dimensional text data augmented.. - Dhillon, Guan et al. - 2001
7   Conditions for the equivalence of hierarchical and non-hiera.. (context) - Mitchell - 1998
6   Std 754-1985 edition (context) - for, Point et al. - 1985
3   Journal of Global Optimization (context) - Bradley, Mangasarian - 2000
2   An information theoretic approach to finding word groups for.. - Verbeek - 2000
2   to appear (context) - Modha, Spangler et al. - 1993

Documents on the same site (http://www-ai.informatik.uni-dortmund.de/DOKUMENTE):   More
Efficient Kernel Calculation for Multirelational Data - Rüping (2002)   (Correct)
Domain Knowledge and Data Mining Process Decisions - Knobbe, Schipper, Brockhausen (2000)   (Correct)
Text Categorization with Support Vector Machines: Learning with.. - Joachims (1998)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC