Results 1 -
9 of
9
Model-Based Clustering, Discriminant Analysis, and Density Estimation
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little ..."
Abstract
-
Cited by 171 (23 self)
- Add to MetaCart
Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as \How many clusters are there?", "Which clustering method should be used?" and \How should outliers be handled?". We outline a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, a...
AE: MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering
- Department of Statistics, University of Washington
, 2006
"... MCLUST is a contributed R package for normal mixture modeling and model-based clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
MCLUST is a contributed R package for normal mixture modeling and model-based clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions that combine model-based hierarchical clustering, EM for mixture estimation and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation and discriminant analysis. There is additional functionality for displaying and visualizing the models along with clustering and classification results. A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior. MCLUST is licensed by the University of Washington and distributed through
MCLUST: Software for Model-Based Clustering, Density Estimation and Discriminant Analysis
- Journal of Classification
, 2002
"... Contents 1 Models 4 2 Obtaining and Installing MCLUST 5 2.1 Using MCLUST with S-PLUS 6 for UNIX/Linux . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Using MCLUST with S-PLUS 6 for Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Hierarchical Clustering 6 4 EM for Mix ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
Contents 1 Models 4 2 Obtaining and Installing MCLUST 5 2.1 Using MCLUST with S-PLUS 6 for UNIX/Linux . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Using MCLUST with S-PLUS 6 for Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Hierarchical Clustering 6 4 EM for Mixture Models 8 4.1 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Individual E and M Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Bayesian Information Criterion 10 6 Cluster Analysis 11 6.1 Mclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.2 EMclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.3 Clustering with Noise and Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7 Simulation from Mixture Densities 19 8 Density Estimation 21 9 Displays
Bayesian regularization for normal mixture estimation and model-based clustering
, 2005
"... Normal mixture models are widely used for statistical modeling of data, including cluster analysis. However maximum likelihood estimation (MLE) for normal mixtures using the EM algorithm may fail as the result of singularities or degeneracies. To avoid this, we propose replacing the MLE by a maximum ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Normal mixture models are widely used for statistical modeling of data, including cluster analysis. However maximum likelihood estimation (MLE) for normal mixtures using the EM algorithm may fail as the result of singularities or degeneracies. To avoid this, we propose replacing the MLE by a maximum a posteriori (MAP) estimator, also found by the EM algorithm. For choosing the number of components and the model parameterization, we propose a modified version of BIC, where the likelihood is evaluated at the MAP instead of the MLE. We use a highly dispersed proper conjugate prior, containing a small fraction of one observation’s worth of information. The resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE, EM and BIC. Key words: BIC; EM algorithm; mixture models; model-based clustering; conjugate prior; posterior mode. 1
Incremental Model-Based Clustering for Large Datasets with Small Clusters
- Journal of Computational and Graphical Statistics
, 2003
"... Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be e#ective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be e#ective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing di#culty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations.
Model-Based Clustering, Discriminant Analysis, and Density Estimation
- Journal of the American Statistical Association
, 2000
"... Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little ..."
Abstract
- Add to MetaCart
Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as \How many clusters are there?", \Which clustering method should be used?" and \How should outliers be handled?". We outline a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, a...
MCLUST: Software for rvlodel-Based Clustering, Density Estimation and Discriminant A.nalysis *t
, 2002
"... MCLUST is a software package for model-based clustering, density estimation and discriminant analysis interfaced to the S-PLUS commercial software. 1 2 It implements parameterized Gaussian hierarchical clustering algorithms and the EM algorithm for parameterized Gaussian mixture models with the poss ..."
Abstract
- Add to MetaCart
MCLUST is a software package for model-based clustering, density estimation and discriminant analysis interfaced to the S-PLUS commercial software. 1 2 It implements parameterized Gaussian hierarchical clustering algorithms and the EM algorithm for parameterized Gaussian mixture models with the possible addition of a Poisson noise term. Also included are functions that combine hierarchical clustering, EM and the Bayesian Information (BIC) in comprehensive strategies for clustering, density estimation, and discriminant analysis. MCLUST provides functionality for displaying and visualizing clustering and classification
mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation
, 2012
"... mclust is a contributed R package for model-based clustering, classification, and density estimation based on finite normal mixture modeling. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simul ..."
Abstract
- Add to MetaCart
mclust is a contributed R package for model-based clustering, classification, and density estimation based on finite normal mixture modeling. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions that combine model-based hierarchical clustering, EM for mixture estimation and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation and discriminant analysis. There is additional functionality for displaying and visualizing the models along with clustering, classification, and density estimation results. Several features of the software have been changed in this version, in particular the functionality for discriminant analysis and density estimation has

