Download:
by Nizar Grira, Michel Crucianu, Nozha Boujemaa
in ‘A Review of Machine Learning Techniques for Processing Multimedia Content’, Report of the MUSCLE European Network of Excellence (FP6
http://www-rocq.inria.fr/~crucianu/src/BriefSurveyClustering.pdf
Add To MetaCart
Abstract:
Clustering (or cluster analysis) aims to organize a collection of data items into clusters, such that items within a cluster are more “similar ” to each other than they are to items in the other clusters. This notion of similarity can be expressed in very different ways, according to the purpose of the study, to domain-specific assumptions and to prior knowledge of the problem. Clustering is usually performed when no information is available concerning the membership of data items to predefined classes. For this reason, clustering is traditionally seen as part of unsupervised learning. We nevertheless speak here of unsupervised clustering to distinguish it from a more recent and less common approach that makes use of a small amount of supervision to “guide ” or “adjust ” clustering (see section 2). To support the extensive use of clustering in computer vision, pattern recognition, information retrieval, data mining, etc., very many different methods were developed in several communities. Detailed surveys of this domain can be found in [25], [27] or [26]. In the following, we attempt to briefly review a few core concepts of cluster analysis and describe categories of clustering methods that are best represented in the literature. We also take this opportunity to provide some pointers to more recent work on clustering.
Citations
|
4704
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
1479
|
Algorithms for Clustering Data
– Jain, C
- 1988
|
|
728
|
Finding Groups in Data: An Introduction to Cluster Analysis
– Kaufman, Rousseeuw
- 1990
|
|
597
|
Data clustering: A review
– Jain, Murty, et al.
- 1999
|
|
572
|
A density-based algorithm for discovering clusters in large spatial databases with noise
– Ester, Kriegel, et al.
- 1996
|
|
431
|
On spectral clustering: Analysis and an algorithm
– Ng, Jordan, et al.
- 2001
|
|
405
|
Automatic subspace clustering of high dimensional data for data mining applications
– Agrawal, Gehrke, et al.
- 1998
|
|
371
|
CURE: an efficient clustering algorithm for large databases
– Guha, Rastogi, et al.
- 1998
|
|
294
|
Cluster Analysis
– Everitt
- 1993
|
|
204
|
Model-based gaussian and non-gaussian clustering
– Banfield, Raftery
- 1993
|
|
203
|
OPTICS: Ordering Points To Identify the Clustering Structure
– Ankerst, Breunig, et al.
- 1999
|
|
203
|
Distance metric learning, with application to clustering with side-information
– Xing, Ng, et al.
- 2003
|
|
194
|
Hierarchical grouping to optimize an objective function
– Ward
- 1963
|
|
152
|
Graph-theoretical Methods for Detecting and Describing Gestalt Clusters
– Zahn
- 1971
|
|
151
|
Unsupervised optimal fuzzy clustering
– Gath, Geva
- 1989
|
|
136
|
An efficient approach to clustering in large multimedia databases with noise
– Hinneburg, Keim
- 1998
|
|
108
|
Adaptive duplicate detection using learnable string similarity measures
– Bilenko, Mooney
- 2003
|
|
97
|
A validity measure for fuzzy clustering
– Xie, Beni
- 1991
|
|
83
|
From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering
– Klein, Kamvar, et al.
- 2002
|
|
82
|
Support vector clustering
– Ben-Hur, Horn, et al.
- 2001
|
|
78
|
Clustering with instance-level constraints
– Wagstaff, Cardie
- 2000
|
|
75
|
Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data mining and knowledge discovery
– Sander, Ester, et al.
- 1998
|
|
62
|
A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
– Huang
- 1997
|
|
61
|
Semi-supervised clustering by seeding
– Basu, Banerjee, et al.
- 2002
|
|
61
|
Some New Indexes of Cluster Validity
– Bezdek, Pal
- 1998
|
|
58
|
Semi-supervised clustering with user feedback
– Cohn, Caruana, et al.
- 2003
|
|
47
|
Step-wise clustering procedures
– King
- 1967
|
|
40
|
Semi-supervised clustering using genetic algorithms
– Demiriz, Bennett, et al.
- 1999
|
|
19
|
Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering
– Basu, Bilenko, et al.
- 2003
|
|
5
|
Use of the adaptive fuzzy clustering algorithm to detect lines in digital images
– Dav'e
- 1989
|
|
3
|
Structure of hierarchic clusterings: implications for information retrieval and for multivariate data analysis
– Murtagh
- 1984
|
|
2
|
FCM: Fuzzy c-means algorithm. Computers and Geoscience
– Bezdek, Ehrlich, et al.
- 1984
|
|
2
|
Davé and Raghu Krishnapuram. Robust clustering methods: A unified view
– Rajesh
- 1997
|
|
2
|
Some methods for classifcation and analysis of multivariate observations
– McQueen
- 1967
|
|
1
|
Celeux and Gérard Govaert. Gaussian parsimonious clustering models
– Gilles
- 1995
|
|
1
|
Frederix and Eric Pauwels. Two general geometric cluster validity indices
– Greet
- 2004
|
|
1
|
Saux and Nozha Boujemaa. Unsupervised robust clustering for image database categorization
– Le
- 2002
|