Survey of clustering algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2005
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
A Robust Competitive Clustering Algorithm with Applications in Computer Vision
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed Robust Competitive Agglomeration (RCA) algorithm starts with a large number of clusters to reduce the sensitivity to initialization, and determines the actual number of clusters by a process of competitive agglomeration. Noise immunity is achieved by incorporating concepts from robust statistics into the algorithm. RCA assigns two different sets of weights for each data point: the first set of constrained weights represents degrees of sharing, and is used to create a competitive environment and to generate a fuzzy partition of the data set. The second set corresponds to robust weights, and is used to obtain robust estimates of the cluster prototypes. By choosing an appropriate distance measure in the objective function, RCA can be used to find a...
Automated Segmentation of Multiple Sclerosis Lesions by . . .
, 2000
Quantitative analysis of MR images is becoming increasingly important in clinical trials in multiple sclerosis (MS). This paper describes a fully automated atlasbased technique for segmenting MS lesions from large data sets of multichannel MR images. The method simultaneously estimates the parameters of a stochastic model for normal brain MR images, and detects MS lesions as voxels that are not well explained by the model. It corrects for MR field inhomogeneities, estimates tissuespecific intensity models from the data itself, and incorporates contextual information in the MS lesion segmentation using a Markov random field. The results of the automated method were compared with lesions delineated by human experts, showing a high total lesion load correlation. When the degree of spatial correspondence between segmentations was taken into account, considerable disagreement was revealed, both between the expert manual segmentations, and between expert and automatic measurements.
Robust mixture modelling using the t distribution
 Statistics and Computing
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.
A Survey of Fuzzy Clustering Algorithms for Pattern Recognition  Part 11
the concepts of fuzzy clustering and soft competitive learning in clustering algorithms is proposed on the basis of the existing literature. Moreover, a set of functional attributes is selected for use as dictionary entries in the comparison of clustering algorithms. In this paper, five clustering algorithms taken from the literature are reviewed, assessed and compared on the basis of the selected properties of interest. These clustering models are 1) selforganizing map (SOM); 2) fuzzy learning vector quantization (FLVQ); 3) fuzzy adaptive resonance theory (fuzzy ART); 4) growing neural gas (GNG); 5) fully selforganizing simplified adaptive resonance theory (FOSART). Although our theoretical comparison is fairly simple, it yields observations that may appear parodoxical. First, only FLVQ, fuzzy ART, and FOSART exploit concepts derived from fuzzy set theory (e.g., relative and/or absolute fuzzy membership functions). Secondly, only SOM, FLVQ, GNG, and FOSART employ soft competitive learning mechanisms, which are affected by asymptotic misbehaviors in the case of FLVQ, i.e., only SOM, GNG, and FOSART are considered effective fuzzy clustering algorithms. Index Terms—Ecological net, fuzzy clustering, modular architecture, relative and absolute membership function, soft and hard competitive learning, topologically correct mapping. I.
On clustering of fMRI time series
, 1997
Introduction. The spatiotemporal fMRI signal is a combination of several interacting components: The locally correlated hemodynamic response, the network of neuronal activations, and global components such as the cardiac cycle, breathing etc. A priori this implies that the signal is correlated in time and space, and that these correlations have both short and long range components. Clustering is a classical nonparametric approach to explorative analysis data. By clustering we can group signals according to a given objective function. Clustering of waveforms has already been used in fMRI signal analysis, see e.g. (1). Clustering of stochastic data, however, is hard optimization problem with many potential pitfalls. The "optimal" cluster configuration depends on the particular choice of clustering scheme (e.g. kmeans, kmedians, hierachical clustering) examples are legio (2), but just as importantly on the choice of distance metr
Robust Cluster Analysis Via Mixtures Of Multivariate tDistributions
 Lecture Notes in Computer Science
, 1998
. Normal mixture models are being increasingly used as a way of clustering sets of continuous multivariate data. They provide a probabilistic (soft) clustering of the data in terms of their fitted posterior probabilities of membership of the mixture components corresponding to the clusters. An outright (hard) clustering can be subsequently obtained by assigning each observation to the component to which it has the highest fitted posterior probability of belonging. However, outliers in the data can affect the estimates of the parameters in the normal component densities, and hence the implied clustering. A more robust approach is to fit mixtures of multivariate tdistributions, which have longer tails than the normal components. The expectationmaximization (EM) algorithm can be used to fit mixtures of tdistributions by maximum likelihood. The application of this model to provide a robust approach to clustering is illustrated on a real data set. It is demonstrated how the use of tcom...
On mining Web Access Logs
 In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
, 2000
The proliferation of information on the world wide web has made the personalization of this information space a necessity. One possible approach to web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowledge, unsupervised classification or clustering methods seem to be ideally suited to analyze the semistructured log data of user accesses. In this paper, we define the notion of a “user session”, as well as a dissimilarity measure between two web sessions that captures the organization of a web site. To extract a user access profile, we cluster the user sessions based on the pairwise dissimilarities using a robust fuzzy clustering algorithm that we have developed. We report the results of experiments with our algorithm and show that this leads to extraction of interesting user profiles. We also show that it outperforms association rule based approaches for this task. 1
Lowcomplexity fuzzy relational clustering algorithms for web mining
 IEEE TRANSACTIONS ON FUZZY SYSTEMS
, 2001
This paper presents new algorithms—fuzzy cmedoids (FCMdd) and robust fuzzy cmedoids (RFCMdd)—for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the wellknown relational fuzzy cmeans algorithm (RFCM) shows that FCMdd is more efficient. We present several applications of these algorithms to Web mining, including Web document clustering, snippet clustering, and Web access log analysis.
A novel kernelized fuzzy cmeans algorithm with application in medical image segmentation
 Artificial Intelligence in Medicine
, 2004
