Results 1  10
of
47
Semisupervised graph clustering: a kernel approach
, 2008
"... Semisupervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semisupervised clustering algorithms are designed for data represented as vectors. In this ..."
Abstract

Cited by 94 (3 self)
 Add to MetaCart
Semisupervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semisupervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vectorbased and graphbased approaches. We first show that a recentlyproposed objective function for semisupervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel kmeans objective (Dhillon et al., in Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, 2004a). A recent theoretical connection between weighted kernel kmeans and several graph clustering objectives enables us to perform semisupervised clustering of data given either as vectors or as a graph. For graph data, this result leads to algorithms for optimizing several new semisupervised graph clustering objectives. For vector data, the kernel approach also enables us to find clusters with nonlinear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., in Proceedings of the 17th International Joint Conference on Artificial Intelligence, 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current stateoftheart semisupervised algorithms on both vectorbased and graphbased data sets.
Creating a Cluster Hierarchy under Constraints of a Partially Known Hierarchy Abstract
"... Although clustering under constraints is a current research topic, a hierarchical setting, in which a hierarchy of clusters is the goal, is usually not considered. This paper tries to fill this gap by analyzing a scenario, where constraints are derived from a hierarchy that is partially known in adv ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
(Show Context)
Although clustering under constraints is a current research topic, a hierarchical setting, in which a hierarchy of clusters is the goal, is usually not considered. This paper tries to fill this gap by analyzing a scenario, where constraints are derived from a hierarchy that is partially known in advance. This scenario can be found, e.g., when structuring a collection of documents according to a user specific hierarchy. Major issues of current approaches to constraint based clustering are discussed, especially towards the hierarchical setting. We introduce the concept of hierarchical constraints and continue by presenting and evaluating two approaches using them. The approaches cover the two major fields of constraint based clustering, i.e. instance and metric based constraint integration. Our objects of interest are text documents. Therefore, the presented algorithms are especially fitted to work for these where necessary. Despite showing the properties and ideas of the algorithms in general, we evaluated the case of constraints that are unevenly scattered over the instance space, which is very common for realworld problems but not satisfyingly covered in other work so far. 1
Identifying and generating easy sets of constraints for clustering
 In AAAI ’06: Proceedings, The TwentyFirst National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference
, 2006
"... Clustering under constraints is a recent innovation in the artificial intelligence community that has yielded significant practical benefit. However, recent work has shown that for some negative forms of constraints the associated subproblem of just finding a feasible clustering is NPcomplete. Th ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
Clustering under constraints is a recent innovation in the artificial intelligence community that has yielded significant practical benefit. However, recent work has shown that for some negative forms of constraints the associated subproblem of just finding a feasible clustering is NPcomplete. These worst case results for the entire problem class say nothing of where and how prevalent easy problem instances are. In this work, we show that there are large pockets within these problem classes where clustering under constraints is easy and that using easy sets of constraints yields better empirical results. We then illustrate several sufficient conditions from graph theory to identify a priori where these easy problem instances are and present algorithms to create large and easy to satisfy constraint sets.
BoostCluster: Boosting Clustering by Pairwise Constraints
"... Data clustering is an important task in many disciplines. A large number of studies have attempted to improve clustering by using the side information that is often encoded as pairwise constraints. However, these studies focus on designing special clustering algorithms that can effectively exploit t ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
Data clustering is an important task in many disciplines. A large number of studies have attempted to improve clustering by using the side information that is often encoded as pairwise constraints. However, these studies focus on designing special clustering algorithms that can effectively exploit the pairwise constraints. We present a boosting framework for data clustering, termed as BoostCluster, that is able to iteratively improve the accuracy of any given clustering algorithm by exploiting the pairwise constraints. The key challenge in designing a boosting framework for data clustering is how to influence an arbitrary clustering algorithm with the side information since clustering algorithms by definition are unsupervised. The proposed framework addresses this problem by dynamically generating new data representations at each iteration that are, on the one hand, adapted to the clustering results at previous iterations by the given algorithm, and on the other hand consistent with the given side information. Our empirical study shows that the proposed boosting framework is effective in improving the performance of a number of popular clustering algorithms (Kmeans, partitional SingleLink, spectral clustering), and its performance is comparable to the stateoftheart algorithms for data clustering with side information.
Object identification with constraints
 In ICDM
, 2006
"... Object identification aims at identifying different representations of the same object based on noisy attributes such as descriptions of the same product in different online shops or references to the same paper in different publications. Numerous solutions have been proposed for solving this task, ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
Object identification aims at identifying different representations of the same object based on noisy attributes such as descriptions of the same product in different online shops or references to the same paper in different publications. Numerous solutions have been proposed for solving this task, almost all of them based on similarity functions of a pair of objects. Although today the similarity functions are learned from a set of labeled training data, the structural information given by the labeled data is not used. By formulating a generic model for object identification we show how almost any proposed identification model can easily be extended for satisfying structural constraints. Therefore we propose a model that uses structural information given as pairwise constraints to guide collective decisions about object identification in addition to a learned similarity measure. We show with empirical experiments on public and on reallife data that combining both structural information and attributebased similarity enormously increases the overall performance for object identification tasks. 1
Constrained Coclustering of Gene Expression Data
, 2008
"... In many applications, the expert interpretation of coclustering is easier than for monodimensional clustering. Coclustering aims at computing a bipartition that is a collection of coclusters: each cocluster is a group of objects associated to a group of attributes and these associations can sup ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
In many applications, the expert interpretation of coclustering is easier than for monodimensional clustering. Coclustering aims at computing a bipartition that is a collection of coclusters: each cocluster is a group of objects associated to a group of attributes and these associations can support interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the monodimensional case (e.g., using the socalled mustlink and cannotlink constraints). Here, we ... Here, we consider constrained coclustering not only for extended mustlink and cannotlink constraints (i.e., both objects and attributes can be involved), but also for interval constraints that enforce properties of coclusters when considering ordered domains. We propose an iterative coclustering algorithm which exploits userdefined constraints while minimizing the sumsquared residues, i.e., an objective function introduced for gene expression data clustering by Cho et al. (2004). We illustrate the added value of our approach in two applications on gene expression data.
Clustering Trees with Instance Level Constraints
"... Abstract. Constrained clustering investigates how to incorporate domain knowledge in the clustering process. The domain knowledge takes the form of constraints that must hold on the set of clusters. We consider instance level constraints, such as mustlink and cannotlink. This type of constraints h ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Constrained clustering investigates how to incorporate domain knowledge in the clustering process. The domain knowledge takes the form of constraints that must hold on the set of clusters. We consider instance level constraints, such as mustlink and cannotlink. This type of constraints has been successfully used in popular clustering algorithms, such as kmeans and hierarchical agglomerative clustering. This paper shows how clustering trees can support instance level constraints. Clustering trees are decision trees that partition the instances into homogeneous clusters. Clustering trees provide a symbolic description for each cluster. To handle nontrivial constraint sets, we extend clustering trees to support disjunctive descriptions. The paper’s main contribution is ClusILC, an efficient algorithm for building such trees. We present experiments comparing ClusILC to COPkmeans. 1
A Clustering Comparison Measure Using Density Profiles and its Application to the Discovery of Alternate Clusterings
 DATA MINING AND KNOWLEDGE DISCOVERY
"... Data clustering is a fundamental and very popular method of data analysis. Its subjective nature, however, means that different clustering algorithms or different parameter settings can produce widely varying and sometimes conflicting results. This has led to the use of clustering comparison measure ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Data clustering is a fundamental and very popular method of data analysis. Its subjective nature, however, means that different clustering algorithms or different parameter settings can produce widely varying and sometimes conflicting results. This has led to the use of clustering comparison measures to quantify the degree of similarity between alternative clusterings. Existing measures, though, can be limited in their ability to assess similarity and sometimes generate unintuitive results. They also cannot be applied to compare clusterings which contain different data points, an activity which is important for scenarios such as data stream analysis. In this paper, we introduce a new clustering similarity measure, known as ADCO, which aims to address some limitations of existing measures, by allowing greater flexibility of comparison via the use of density profiles to characterize a clustering. In particular, it adopts a ‘data mining style’ philosophy to clustering comparison, whereby two clusterings are considered to be more similar, if they are likely to give rise to similar types of prediction models. Furthermore, we show that this new measure can be applied as a highly effective objective function within a new algorithm, known as MAXIMUS, for generating alternate clusterings.
Learning from Noisy Side Information by Generalized Maximum Entropy Model
"... We consider the problem of learning from noisy side information in the form of pairwise constraints. Although many algorithms have been developed to learn from side information, most of them assume perfect pairwise constraints. Given the pairwise constraints are often extracted from data sources suc ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning from noisy side information in the form of pairwise constraints. Although many algorithms have been developed to learn from side information, most of them assume perfect pairwise constraints. Given the pairwise constraints are often extracted from data sources such as paper citations, they tend to be noisy and inaccurate. In this paper, we introduce the generalization of maximum entropy model and propose a framework for learning from noisy side information based on the generalized maximum entropy model. The theoretic analysis shows that under certain assumption, theclassificationmodeltrainedfromthe noisy side information can be very close to theonetrainedfromthe perfectsideinformation. Extensive empirical studies verify the effectiveness of the proposed framework. 1.
J.F.: Towards constrained coclustering in ordered 0/1 data sets
 In: Proceedings of International Symposium on Methodologies for Intelligent Systems (LNAI
, 2006
"... Abstract. Within 0/1 data, coclustering provides a collection of biclusters, i.e., linked clusters for both objects and Boolean properties. Beside the classical need for grouping quality optimization, one can also use userdefined constraints to capture subjective interestingness aspects and thus t ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Within 0/1 data, coclustering provides a collection of biclusters, i.e., linked clusters for both objects and Boolean properties. Beside the classical need for grouping quality optimization, one can also use userdefined constraints to capture subjective interestingness aspects and thus to improve bicluster relevancy. We consider the case of 0/1 data where at least one dimension is ordered, e.g., objects denotes time points, and we introduce coclustering constrained by interval constraints. Exploiting such constraints during the intrinsically heuristic clustering process is challenging. We propose one major step in this direction where biclusters are computed from collections of local patterns. We provide an experimental validation on two temporal gene expression data sets. 1