Results 1 -
7 of
7
Constrained Co-clustering of Gene Expression Data
, 2008
"... In many applications, the expert interpretation of coclustering is easier than for mono-dimensional clustering. Co-clustering aims at computing a bi-partition that is a collection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can sup ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
In many applications, the expert interpretation of coclustering is easier than for mono-dimensional clustering. Co-clustering aims at computing a bi-partition that is a collection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional case (e.g., using the so-called must-link and cannot-link constraints). Here, we ... Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e., both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We propose an iterative coclustering algorithm which exploits user-defined constraints while minimizing the sum-squared residues, i.e., an objective function introduced for gene expression data clustering by Cho et al. (2004). We illustrate the added value of our approach in two applications on gene expression data.
Co-clustering Numerical Data under User-defined Constraints
, 2009
"... Abstract: In the generic setting of objects × attributes matrix data analysis, co-clustering appears as an interesting unsupervised data mining method. A co-clustering task provides a bi-partition made of co-clusters: each co-cluster is a group of objects associated to a group of attributes and thes ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract: In the generic setting of objects × attributes matrix data analysis, co-clustering appears as an interesting unsupervised data mining method. A co-clustering task provides a bi-partition made of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support expert interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional clustering case (e.g. using the must-link and cannot-link constraints on one of the two dimensions). Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e. both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We describe an iterative co-clustering algorithm which exploits user-defined constraints while minimizing a given objective function. Thanks to a generic setting, we emphasize that different objective functions can be used. The added value of our approach is demonstrated on both synthetic and real data. Among others, several experiments illustrate the practical impact of this original co-clustering setting in the context of gene expression data analysis, and in an original application to a protein motif discovery problem. © 2009 Wiley
2010 IEEE International Conference on Data Mining Efficient Semi-supervised Spectral Co-clustering with Constraints
"... Abstract—Co-clustering was proposed to simultaneously cluster objects and features to explore inter-correlated patterns. For example, by analyzing the blog click-through data, one finds the group of users who are interested in a specific group of blogs in order to perform applications such as recomm ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Co-clustering was proposed to simultaneously cluster objects and features to explore inter-correlated patterns. For example, by analyzing the blog click-through data, one finds the group of users who are interested in a specific group of blogs in order to perform applications such as recommendations. However, it is usually very difficult to achieve good co-clustering quality by just analyzing the object-feature correlation data due to the sparsity of the data and the noise. Meanwhile, one may have some prior knowledge that indicates the internal structure of the co-clusters. For instance, one may find user cluster information from the social network system, and the blog-blog similarity from the social tags or contents. This prior information provides some supervision toward the co-cluster structures, and may help reduce the effect of sparsity and noise. However, most co-clustering algorithms do not use this information and may produce unmeaningful results. In this paper we study the problem of finding the optimal coclusters when some objects and features are believed to be in the same cluster a priori. A matrix decomposition based approach is proposed to formulate as a trace minimization problem, and solve it efficiently with the selected eigenvectors. The asymptotic complexity of the proposed approach is the same as co-clustering without constraints. Experiments include graph-pattern co-clustering and document-word co-clustering. For instance, in graph-pattern data set, the proposed model can improve the normalized mutual information by as much as 5.5 times and 10 times faster than two naive solutions that expand the edges and vertices in the graphs. I.
Value, Cost, and Sharing: Open Issues in Constrained Clustering
"... Abstract. Clustering is an important tool for data mining, since it can identify major patterns or trends without any supervision (labeled data). Over the past five years, semi-supervised (constrained) clustering methods have become very popular. These methods began with incorporating pairwise const ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Clustering is an important tool for data mining, since it can identify major patterns or trends without any supervision (labeled data). Over the past five years, semi-supervised (constrained) clustering methods have become very popular. These methods began with incorporating pairwise constraints and have developed into more general methods that can learn appropriate distance metrics. However, several important open questions have arisen about which constraints are most useful, how they can be actively acquired, and when and how they should be propagated to neighboring points. This position paper describes these open questions and suggests future directions for constrained clustering research. 1
Numerical Data Co-clustering via Sum-Squared Residue Minimization and User-defined Constraint Satisfaction
, 2008
"... Co-clustering aims at computing a bi-partition that is a collection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. We consider constrained co-clustering not only for extended must-link and cannot-link cons ..."
Abstract
- Add to MetaCart
(Show Context)
Co-clustering aims at computing a bi-partition that is a collection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. We consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e., both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We propose an iterative co-clustering algorithm which exploits user-defined constraints while minimizing the sum-squared residues, i.e., an objective function introduced for gene expression data clustering by Cho et al. (2004).
Chapter 1 Constraint-driven co-clustering of 0/1 data
"... Abstract We investigate a co-clustering framework (i.e., a method that provides a partition of objects and a linked partition of features) for binary data sets. So far, constrained co-clustering has been seldomly explored. First, we consider straightforward extensions of the classical instance level ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract We investigate a co-clustering framework (i.e., a method that provides a partition of objects and a linked partition of features) for binary data sets. So far, constrained co-clustering has been seldomly explored. First, we consider straightforward extensions of the classical instance level constraints (Must-link, Cannot-link) to express relationships on both objects and features. Furthermore, we study constraints that exploit sequential orders on objects and/or features. The idea is that we can specify whether the extracted co-clusters should involve or not contiguous elements (Interval and non-Interval constraints). Instead of designing constraint processing integration within a co-clustering scheme, we propose a Local-to-Global (L2G) framework. It consists in postprocessing a collection of (constrained) local patterns that have been computed beforehand (e.g., closed feature sets and their supporting sets of objects) to build a global pattern like a co-clustering. Roughly speaking, the algorithmic scheme is a K-Means-like approach that groups the local patterns. We show that it is possible to push local counterparts of the global constraints on the co-clusters during the local pattern mining phase itself. A large part of the chapter is dedicated to experiments that demonstrate the added-value of our approach. Considering both synthetic data and real gene expression data sets, we discuss the use of constraints to get not only more stable but also more relevant co-clusters.
Constrained Co-clustering of Gene Expression Data
"... In many applications, the expert interpretation of co-clustering is easier than for mono-dimensional clustering. Co-clustering aims at computing a bi-partition that is a col-lection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can s ..."
Abstract
- Add to MetaCart
(Show Context)
In many applications, the expert interpretation of co-clustering is easier than for mono-dimensional clustering. Co-clustering aims at computing a bi-partition that is a col-lection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. Many constrained clustering algo-rithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional case (e.g., using the so-called must-link and cannot-link con-straints). Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e., both objects and attributes can be involved), but also for in-terval constraints that enforce properties of co-clusters when considering ordered domains. We propose an iterative co-clustering algorithm which exploits user-defined constraints while minimizing the sum-squared residues, i.e., an objec-tive function introduced for gene expression data clustering by Cho et al. (2004). We illustrate the added value of our approach in two applications on gene expression data. 1