Learning and representing topic. a hierarchical mixture model for word occurrence in document databases (1998) [19 citations — 1 self]
Abstract:
This paper presents a novel statistical mixture model for natural language learning in information retrieval. The described learning architecture is based on word occurrence statistics and extracts hierarchical relations between groups of documents as well as an abstractive organization of keywords. To train the model we derive a generalized, annealed version of the Expectation--Maximization (EM) algorithm for maximum likelihood estimation. The benefits of the model for interactive information retrieval and automated cluster summarization are experimentally investigated.
Citations
| 4344 | Maximum likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977 |
| 593 | Hierarchical Mixtures of Experts and the EM algorithm – Jordan, Jacobs - 1993 |
| 402 | Distributional clustering of English words – Pereira, Tishby, et al. - 1993 |
| 347 | Finite Mixture Models – McLachlan, Peel - 2000 |
| 154 | Recent trends in hierarchic document clustering: A critical review – Willett - 1988 |
| 115 | Statistical mechanics and phase transitions in clustering – Rose, Fox - 1990 |
| 89 | Statistical models for co-occurrence data – Hofmann, Puzicha - 1998 |
| 65 | Rijsbergen. The use of hierarchic clustering in information retrieval – Jardine, Van - 1971 |
| 20 | Clustering large files of documents using the single-link method – Croft - 1977 |

