MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Learning and representing topic. a hierarchical mixture model for word occurrence in document databases (1998) [19 citations — 1 self]

Download:
Download as a PDF | Download as a PS
by Thomas Hofmann
Proc. Workshop on learning from text and the web, CMU
http://www.ai.mit.edu/people/hofmann/HofmannCONALD98.ps
Add To MetaCart

Abstract:

This paper presents a novel statistical mixture model for natural language learning in information retrieval. The described learning architecture is based on word occurrence statistics and extracts hierarchical relations between groups of documents as well as an abstractive organization of keywords. To train the model we derive a generalized, annealed version of the Expectation--Maximization (EM) algorithm for maximum likelihood estimation. The benefits of the model for interactive information retrieval and automated cluster summarization are experimentally investigated.

Citations

4344 Maximum likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
593 Hierarchical Mixtures of Experts and the EM algorithm – Jordan, Jacobs - 1993
402 Distributional clustering of English words – Pereira, Tishby, et al. - 1993
347 Finite Mixture Models – McLachlan, Peel - 2000
154 Recent trends in hierarchic document clustering: A critical review – Willett - 1988
115 Statistical mechanics and phase transitions in clustering – Rose, Fox - 1990
89 Statistical models for co-occurrence data – Hofmann, Puzicha - 1998
65 Rijsbergen. The use of hierarchic clustering in information retrieval – Jardine, Van - 1971
20 Clustering large files of documents using the single-link method – Croft - 1977