MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Enhanced word clustering for hierarchical text classification (2002) [29 citations — 0 self]

Download:
Download as a PDF | Download as a PS
by Inderjit S. Dhillon, Subramanyam Mallela, Rahul Kumar
http://www.cs.utexas.edu/ftp/pub/techreports/tr02-17.ps
Add To MetaCart

Abstract:

In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classication. In previous work, such \distributional clustering " of features has been found to achieve signicant improvements over feature selection in terms of classication accuracy, especially at lower number of features [2, 29]. However the existing clustering techniques are agglomerative in nature resulting in (i) sub-optimal word clusters and (ii) high computational cost. In order to explicitly capture the optimality of word clusters in an information theoretic framework, we rst derive a global criterion for feature clustering. We then present a fast, divisive algorithm that monotonically decreases this objective function value, thus converging to a local minimum. We show that our algorithm minimizes the \within-cluster Jensen-Shannon divergence " while simultaneously maximizing the \between-cluster Jensen-Shannon divergence". In comparison to the previously proposed agglomerative strategies our divisive algorithm achieves higher classication accuracy especially at lower number of features. We further show that feature clustering is an eective technique for building smaller class models in hierarchical classi cation. We present detailed experimental results on the 20 News groups data set and a 3-level hierarchy of HTML documents collected from Dmoz Open Directory.

Citations

4595 Statistical Learning Theory – Vapnik - 1998
4433 Elements of Information Theory – Cover, Thomas - 1991
2789 A mathematical theory of communication – Shannon - 1948
1486 Indexing By Latent Semantic Analysis – Deerwester, Dumais, et al. - 1990
949 Pattern Classification – Duda, Hart, et al. - 2001
518 A comparative study on feature selection in text categorization – Yang, Pedersen - 1997
482 A Comparison of Event Models for Naive Bayes Text Classi cation – McCallum, Nigam - 1998
421 A re-examination of text categorization methods – Yang, Liu - 1999
418 1951]:‘On information and sufficiency – Kullback, Leibler
405 Distributional Clustering of English Words – Pereira, Tishby, et al. - 1993
337 On the optimality of the simple Bayesian classifier under zero-one loss – Domingos, Pazzani - 1997
293 Hierarchically classifying documents using very few words – Koller, Sahami - 1997
263 Probabilistic Latent Semantic Indexing – Hofmann - 1999
203 What Every Computer Scientist should know about Floating Point – Goldberg - 1991
180 Divergence measures based on the Shannon entropy – Lin - 1991
177 Concept decompositions for large sparse text data using clustering – Dhillon, Modha
154 On bias, variance, 0/1-loss, and the curse-of-dimensionality – FRIEDMAN - 1997
153 Distributional clustering of words for text classification – Baker, McCallum - 1998
120 Measures of distributional similarity – Sapporo, Lee, et al. - 1999
112 A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow – Bow - 1996
97 A mathematical theory of communication,” The BEll System – Shannon - 1948
93 K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality – Selim, Ismail - 1984
92 Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classifications (abstract – Forgy - 1965
59 Unsing taxonomy, discriminants, and signatures for navigating in text databases – Charkabarti, Dom, et al. - 1997
41 On information and suciency – Kullback, Leibler - 1951
38 Pattern classi – Duda, Hart, et al. - 2001
35 IEEE Standard for Binary Floating Point Arithmetic, Std 754-1985 edition – ANSIIEEE - 1985
35 On feature distributional clustering for text categorization – Bekkerman, El-Yaniv, et al. - 2001
33 Introduction to Modern Retrieval – Salton, McGill - 1983
32 The complexity of the generalized Lloyd–Max problem – Garey, Johnson, et al. - 1982
27 On the optimality of the simple Bayesian classi under zero-one loss – Domingos, Pazzani - 1997
9 Cluster analysis of multivariate data: Eciency vs. interpretability of classi – Forgy - 1965
7 The power of word clusters for text classi – Slonim, Tishby - 2001
6 Conditions for the equivalence of hierarchical and non-hierarchical bayesian classifiers – Mitchell - 1998
5 Hierarchical Classi of Web Content – Dumais, Chen - 2000
4 Introduction to Modern Retrieval. McGraw-Hill Book Company – Salton, McGill - 1983
2 Distributional clustering of English words – McGraw-Hill - 1997
1 Probabilistic latent semantic indexing – Theory - 1998