A new unsupervised algorithm for learning a finite mixture model from multivariate data is proposed. The adjective "unsupervised " is justified by two properties of the algorithm: (i) it is capable of selecting the number of components, and, (ii) unlike the standard expectation-maximization (EM) algorithm, it does not require careful initialization of the parameters. The proposed method also avoids another wellknown drawback of EM for mixture fitting: the possibility of convergence towards a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures and mixtures of factor analyzers. These experiments testify for the good performance of our approach, which is simpler and faster than other methods which have been proposed for fitting a mixture model with an unknown number of components. Index Terms-- Unsupervised learning, finite mixtures, model selection, minimum message length criterion, Bayesian methods, expectation-maximization algorithm, clustering.
|
4364
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
2961
|
Pattern Classification and Scene Analysis
– Duda, Hart
- 1973
|
|
1357
|
R.C.: Algorithms for clustering data
– Jain, Dubes
- 1988
|
|
841
|
Estimating the dimension of a model
– Schwarz
- 1978
|
|
634
|
Pattern Recognition with Fuzzy Objective Function Algorithms
– Bezdek
- 1981
|
|
574
|
Bayesian Theory
– Bernardo, Smith
- 1994
|
|
505
|
The EM Algorithm and Extensions
– McLachlan, Krishnan
- 1996
|
|
410
|
A New View of the EM Algorithm that Justifies Incremental and Other Variants“, Learning in Graphical Models
– Neal, Hinton
- 1993
|
|
404
|
Statistical Analysis of Finite Mixture Distributions
– Titterington, Smith, et al.
- 1985
|
|
369
|
Stochastic Complexity
– Rissanen
- 1989
|
|
347
|
Statistical pattern recognition: A review
– Jain, Duin, et al.
- 2000
|
|
347
|
Finite Mixture Models
– McLachlan, Peel
- 2000
|
|
301
|
Mixture models: Inference and Applications to Clustering
– McLachlan, Basford
- 1988
|
|
292
|
Nonlinear Programming
– Bertsekas
- 1995
|
|
263
|
Mixtures of probabilistic principal component analyzers
– Tipping, Bishop
- 1999
|
|
244
|
On Bayesian Analysis of Mixtures with an Unknown Number of Components
– Richardson, Green
|
|
217
|
Maximum likelihood estimation from incomplete data via the em algorithm (with discussion
– Dempster, Laird, et al.
- 1977
|
|
203
|
Sphere Packings, Lattices and Groups
– Conway, Sloane
- 1999
|
|
185
|
Model-based Gaussian and non-Gaussian clustering
– BANFIELD, RAFTERY
- 1993
|
|
153
|
The EM algorithm for mixtures of factor analyzers
– Ghahramani, Hinton
- 1996
|
|
151
|
How many clusters? Which clustering method? Answers via model-based cluster analysis
– Fraley, Raftery
- 1998
|
|
142
|
Pairwise Data Clustering by Deterministic Annealing
– Hofmann, Buhmann
- 1997
|
|
117
|
Modeling the manifolds of images of handwritten digits
– Hinton, Dayan, et al.
- 1996
|
|
110
|
Monotone operators and the proximal point algorithm
– Rockafellar
- 1976
|
|
106
|
Tibshirani,“Discriminant analysis by Gaussian mixtures
– Hastie, R
- 1996
|
|
93
|
On convergence properties of the EM algorithm for gaussian mixtures
– Xu, Jordan
- 1996
|
|
92
|
Variational inference for Bayesian mixtures of factor analysers
– Ghahramani, Beal
- 1999
|
|
81
|
Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners
– Raudys
- 1991
|
|
67
|
SMEM Algorithm for Mixture Models
– Ueda, Nakano, et al.
- 2000
|
|
61
|
Practical Bayesian density estimation using mixtures of normals
– Roeder, Wasserman
- 1997
|
|
59
|
Bayesian approaches to Gaussian mixture modeling
– Roberts, Husmeier, et al.
- 1998
|
|
54
|
Minimum Message Length and Kolmogorov Complexity
– Wallace, Dowe
- 1999
|
|
53
|
The infinite Gaussian mixture model
– Rasmussen
|
|
47
|
Deterministic annealing EM algorithm
– Ueda, Nakano
- 1998
|
|
46
|
Structure learning in conditional probability models via an entropic prior and parameter extinction
– Brand
- 1999
|
|
43
|
Assessing a mixture model for clustering with the integrated completed likelihood
– Biernacki, Celeux, et al.
- 2000
|
|
42
|
On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture
– McLachlan
- 1987
|
|
34
|
Model selection for probabilistic clustering using cross-validated likelihood
– Smyth
- 2000
|
|
33
|
Unsupervised Learning using MML
– Oliver, Baxter, et al.
- 1996
|
|
24
|
An entropy criterion for assessing the number of clusters in a mixture model
– Celeux, Soromenho
- 1996
|
|
21
|
Linear Flaw Detection in Woven Textiles using Model-Based Clustering”, Pattern Recognition Letters
– Campbell, Fraley, et al.
|
|
20
|
R'egularisation d'in'equations variationnelles par approximations successives, Rev
– Martinet
- 1970
|
|
19
|
On dimensionality, sample size, classification error and complexity of classification algorithm in pattern recognition
– Raudys, Pikelis
- 1980
|
|
19
|
Maximum likelihood training of probabilistic neural networks
– Streit, Luginbuhl
- 1994
|
|
18
|
Testing for mixtures: A Bayesian entropic approach
– Mengersen, Robert
- 1996
|
|
18
|
Bayesian mixture modeling
– Neal
- 1992
|
|
17
|
A component-wise EM algorithm for mixtures
– Celeux, Chrétien, et al.
- 1999
|
|
15
|
On fitting mixture models
– Figueiredo, Leitao, et al.
- 1999
|
|
14
|
Inference in modelbased cluster analysis
– Bensmail, Celeux, et al.
- 1997
|
|
14
|
A View of the EM Algorithm that
– Neal, Hinton
- 1999
|