Results 1 -
9 of
9
Clustering aggregation
- In Proceedings of the 21st International Conference on Data Engineering (ICDE
, 2005
"... We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the problem: each categ ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the problem: each categorical variable can be viewed as a clustering of the input rows. Moreover, clustering aggregation can be used as a meta-clustering method to improve the robustness of clusterings. The problem formulation does not require apriori information about the number of clusters, and it gives a natural way for handling missing values. We give a formal statement of the clustering-aggregation problem, we discuss related work, and we suggest a number of algorithms. For several of the methods we provide theoretical guarantees on the quality of the solutions. We also show how sampling can be used to scale the algorithms for large data sets. We give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions. 1
Meta clustering
- In Proceedings IEEE International Conference on Data Mining
, 2006
"... Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms se ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for optimal clusterings based on a pre-specified clustering criterion. Our approach differs. We search for many alternate clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Meta clustering first finds a variety of clusterings and then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. We present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems and two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings. 1.
Weighted clustering ensembles
- In Proceedings of The 6th SIAM International Conference on Data Mining
, 2006
"... Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this paper, we address the problem of combining multiple weighted clusters which belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus function makes use of the weight vectors associated with the clusters. The experimental results show that our ensemble technique is capable of producing a partition that is as good as or better than the best individual clustering. 1
Consensus Clusterings
"... In this paper we address the problem of combining multiple clusterings without access to the underlying features of the data. This process is known in the literature as clustering ensembles, clustering aggregation, or consensus clustering. Consensus clustering yields a stable and robust final cluste ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper we address the problem of combining multiple clusterings without access to the underlying features of the data. This process is known in the literature as clustering ensembles, clustering aggregation, or consensus clustering. Consensus clustering yields a stable and robust final clustering that is in agreement with multiple clusterings. We find that an iterative EM-like method is remarkably effective for this problem. We present three iterative algorithms for finding clustering consensus. An extensive empirical study compares our proposed algorithms with eleven other consensus clustering methods on four data sets using six different clustering performance metrics. The experimental results show that the new ensemble clustering methods produce clusterings that are as good as, and often better than, these other methods. 1.
A method of clustering combination applied to satellite image analysis
"... An algorithm for combining results of different clusterings is presented in this paper, the objective of which is to find groups of patterns which are common to all clusterings. The idea of the proposed combination is to group those samples which are in the same cluster in most cases. We formulate t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
An algorithm for combining results of different clusterings is presented in this paper, the objective of which is to find groups of patterns which are common to all clusterings. The idea of the proposed combination is to group those samples which are in the same cluster in most cases. We formulate the combination as the resolution of a linear set of equations with binary constraints. The advantage of such a formulation is to provide an objective function for the combination. To optimize the objective function we propose an original unsupervised algorithm. Furthermore, we propose an extension adapted in case of a huge volume of data. The combination of clusterings is performed on the results of different clustering algorithms applied to SPOT5 satellite images and shows the effectiveness of the proposed method. 1.
Seventh IEEE International Conference on Data Mining Mechanism Design for Clustering Aggregation by Selfish Systems
"... We propose a market mechanism that can be implemented on clustering aggregation problem among selfish systems, which tend to lie about their correct clustering during aggregation process. Our study is the preliminary step toward the development of robust distributed data mining among selfish systems ..."
Abstract
- Add to MetaCart
We propose a market mechanism that can be implemented on clustering aggregation problem among selfish systems, which tend to lie about their correct clustering during aggregation process. Our study is the preliminary step toward the development of robust distributed data mining among selfish systems. 1.
Diversity-based Weighting Schemes for Clustering Ensembles
"... Clustering ensembles has been recently recognized as an emerging approach to provide more robust solutions to the data clustering problem. Current methods of clustering ensembles typically fall into instance-based, cluster-based, or hybrid approaches; however, most of such methods fail in discrimina ..."
Abstract
- Add to MetaCart
Clustering ensembles has been recently recognized as an emerging approach to provide more robust solutions to the data clustering problem. Current methods of clustering ensembles typically fall into instance-based, cluster-based, or hybrid approaches; however, most of such methods fail in discriminating among the various clusterings that participate to the ensemble. In this paper, we address the problem of weighting clustering ensembles by proposing general weighting approaches based on different implementations of the notion of diversity. We introduce three weighting schemes for clustering ensembles, called Single Weighting, Group Weighting and Dendrogram Weighting, which are independent of the particular method of clustering ensembles and designed to take into account correlations among the individual clustering solutions in different ways. We show how these schemes can be instantiated into any instance-based, cluster-based and hybrid clustering ensembles methods. Experiments have shown that the performance of the clustering ensembles algorithms increases when the proposed weighting schemes are employed. 1

