Results 1  10
of
21
Robust Cluster Analysis via Mixture Models
 AUSTRIAN JOURNAL OF STATISTICS
, 2006
"... Abstract: Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster data sets. In this paper, we focus on the use of normal mixture models to cluster data sets of continuous multivariate data. As normality based methods of estim ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
Abstract: Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster data sets. In this paper, we focus on the use of normal mixture models to cluster data sets of continuous multivariate data. As normality based methods of estimation are not robust, we review the use of t component distributions. With the t mixture modelbased approach, the normal distribution for each component in the mixture model is embedded in a wider class of elliptically symmetric distributions with an additional parameter called the degrees of freedom. The advantage of the t mixture model is that, although the number of outliers needed for breakdown is almost the same as with the normal mixture model, the outliers have to be much larger. We also consider the use of the t distribution for the robust clustering of highdimensional data via mixtures of factor analyzers. The latter enable a mixture model to be fitted to data which have high dimension relative to the number of data points to be clustered.
Rigid and articulated point registration with expectation conditional maximization
 IEEE Transactions on Pattern Analysis and Machine Intelligence
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
On a resampling approach for tests on the number of clusters with mixture modelbased clustering of tissue samples
, 2004
"... ..."
The noise component in modelbased cluster analysis
 In: Proceedings of GfKl2007. Springer Verlag, Studies in Classification, Data Analysis, and Knowledge Organization
, 2007
"... Abstract. The socalled noisecomponent has been introduced by Banfield and Raftery (1993) to improve the robustness of cluster analysis based on the normal mixture model. The idea is to add a uniform distribution over the convex hull of the data as an additional mixture component. While this yields ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Abstract. The socalled noisecomponent has been introduced by Banfield and Raftery (1993) to improve the robustness of cluster analysis based on the normal mixture model. The idea is to add a uniform distribution over the convex hull of the data as an additional mixture component. While this yields good results in many practical applications, there are some problems with the original proposal: 1) As shown by Hennig (2004), the method is not breakdownrobust. 2) The original approach doesn’t define a proper ML estimator, and doesn’t have satisfactory asymptotic properties. We discuss two alternatives. The first one consists of replacing the uniform distribution by a fixed constant, modelling an improper uniform distribution that doesn’t depend on the data. This can be proven to be more robust, though the choice of the involved tuning constant is tricky. The second alternative is to approximate the MLestimator of a mixture of normals with a uniform distribution more precisely than it is done by the “convex hull ” approach. The approaches are compared by simulations and for a real data example. 1
Outlier detection via parsimonious mixtures of contaminated Gaussian distributions. ArXiv:1305.4669
, 2013
"... ar ..."
tclust: An R Package for a Trimming Approach to Cluster Analysis
"... This introduction to the R package tclust is a (slightly) modified version of Fritz et al. (2012), published in the Journal of Statistical Software. Outlying data can heavily influence standard clustering methods. At the same time, clustering principles can be useful when robustifying statistical pr ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
This introduction to the R package tclust is a (slightly) modified version of Fritz et al. (2012), published in the Journal of Statistical Software. Outlying data can heavily influence standard clustering methods. At the same time, clustering principles can be useful when robustifying statistical procedures. These two reasons motivate the development of feasible robust modelbased clustering approaches. With this in mind, an R package for performing nonhierarchical robust clustering, called tclust, is presented here. Instead of trying to “fit ” noisy data, a proportion α of the most outlying observations is trimmed. The tclust package efficiently handles different cluster scatter constraints. Graphical exploratory tools are also provided to help the user make sensible choices for the trimming proportion as well as the number of clusters to search for.
Modelling Background Noise in Finite Mixtures of Generalized Linear Regression Models
"... Abstract. In this paper we show how only a few outliers can completely break down EMestimation of mixtures of regression models. A simple, yet very effective way of dealing with this problem, is to use a component where all regression parameters are fixed to zero to model the background noise. Thi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. In this paper we show how only a few outliers can completely break down EMestimation of mixtures of regression models. A simple, yet very effective way of dealing with this problem, is to use a component where all regression parameters are fixed to zero to model the background noise. This noise component can be easily defined for different types of generalized linear models, has a familiar interpretation as the empty regression model, and is not very sensitive with respect to its own parameters.
ROBUST IMPROPER MAXIMUM LIKELIHOOD: TUNING, COMPUTATION, AND A COMPARISON WITH OTHER METHODS FOR ROBUST GAUSSIAN CLUSTERING
"... Abstract. The two main topics of this paper are the introduction of the “optimally tuned improper maximum likelihood estimator ” (OTRIMLE) for robust clustering based on the multivariate Gaussian model for clusters, and a comprehensive simulation study comparing the OTRIMLE to Maximum Likelihood in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The two main topics of this paper are the introduction of the “optimally tuned improper maximum likelihood estimator ” (OTRIMLE) for robust clustering based on the multivariate Gaussian model for clusters, and a comprehensive simulation study comparing the OTRIMLE to Maximum Likelihood in Gaussian mixtures with and without noise component, mixtures of tdistributions, and the TCLUST approach for trimmed clustering. The OTRIMLE uses an improper constant density for modelling outliers and noise. This can be chosen optimally so that the nonnoise part of the data looks as close to a Gaussian mixture as possible. Some deviation from Gaussianity can be traded in for lowering the estimated noise proportion. Covariance matrix constraints and computation of the OTRIMLE are also treated. The ideas used for computing the OTRIMLE were applied with benefit to the competing methods, too, in order to make the approaches comparable. In the simulation study, all methods are confronted with setups in which their model assumptions were not exactly fulfilled (even apart from outliers), and in order to evaluate the experiments in a standardized way by misclassification rates, a new modelbased definition of “true clusters ” is introduced that deviates from the usual identification of mixture components with clusters. In the study, every method turns out to be superior for one or more setups, but the OTRIMLE achieves the most satisfactory overall performance. MSC2010 code. 62H30, 62F35, 62P25.
Finding approximately Gaussian clusters via robust improper maximum likelihood. arXiv:1309.6895 preprint available at http://arxiv. org/pdf/1309.6895
, 2013
"... Abstract. The RIMLE method is introduced for robust multivariate clustering finding approximately Gaussian clusters. It maximizes a pseudolikelihood defined by adding a component with improper constant density for accommodating outliers to a Gaussian mixture. Existence, consistency and a breakdow ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The RIMLE method is introduced for robust multivariate clustering finding approximately Gaussian clusters. It maximizes a pseudolikelihood defined by adding a component with improper constant density for accommodating outliers to a Gaussian mixture. Existence, consistency and a breakdown point result is shown. The constant can be chosen dependently on the data so that the fitted clusters are optimally close to a Gaussian mixture. Covariance matrix constraints and computation of the RIMLE are also treated. MSC2010 code. 62H30, 62F35.