MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Abstract Lightweight Document Clustering

Download:
Download as a PDF
by Sholom Weiss, Brian White, Chid Apte, Sholom M. Weiss, Brian F. White, V. Apte
http://www.research.ibm.com/dar/papers/pdf/weiss_ldc_with_cover.pdf
Add To MetaCart

Abstract:

Alightweight document clustering method is described that operates in high dimensions, processes tens of thousands of documents and groups them into several thousand clusters, or by varying a single parameter, into a few dozen clusters. The method uses a reduced indexing view of the original documents, where only the k best keywords of each document are indexed. An e cient procedure for clustering is speci ed in two parts (a) compute k most similar documents for each document in the collection and (b) group the documents into clusters using these similarity scores. The method has been evaluated on a database of over 50,000 customer service problem reports that are reduced to 3,000 clusters and 5,000 exemplar documents. Results demonstrate e cient clustering performance with excellent group similarity measures.

Citations

900 Term-weighting approaches in automatic text retrieval – Salton, Buckley - 1988
431 Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections – Cutting, Karger, et al. - 1992
67 Fast and Intuitive Clustering of Web Documents – Zamir - 1997
33 Using inter-document similarity information in document retrieval systems – Griffiths, Luckhurst, et al. - 1986
21 Model selection in unsupervised learning with applications to document clustering – Vaithyanathan, Dom - 1999
14 A K-Means Clustering Algorithm. Applied Statistics 28:100--108 – Hartigan, Wong - 1979
5 Lightweight Document Matching for Help-Desk Applications – Weiss, White, et al. - 2000
1 Fast and E ective Text Mining Using Linear-time Document Clustering – Larsen, Aone - 1999
1 Chapter 6 - techniques – Willet - 1997