Results 1 -
2 of
2
Hierarchical Co-Clustering Based on Entropy Splitting
"... Two dimensional contingency tables or co-occurrence matrices arise frequently in various important applications such as text analysis and web-log mining. As a fundamental research topic, coclustering aims to generate a meaningful partition of the contingency table to reveal hidden relationships betw ..."
Abstract
- Add to MetaCart
(Show Context)
Two dimensional contingency tables or co-occurrence matrices arise frequently in various important applications such as text analysis and web-log mining. As a fundamental research topic, coclustering aims to generate a meaningful partition of the contingency table to reveal hidden relationships between rows and columns. Traditional co-clustering algorithms usually produce a predefined number of flat partition of both rows and columns, which do not reveal relationship among clusters. To address this limitation, hierarchical co-clustering algorithms have attracted a lot of research interests recently. Although successful in various applications, the existing hierarchial co-clustering algorithms are usually based on certain heuristics and do not have solid theoretical background. In this paper, we present a new co-clustering algorithm with solid information theoretic background. It simultaneously constructs a hierarchical structure of both row and column clusters which retains sufficient mutual information between rows and columns of the contingency table. An efficient and effective greedy algorithm is developed which grows a co-cluster hierarchy by successively performing row-wise or column-wise splits that lead to the maximal mutual information gain. Extensive experiments on real datasets demonstrate that our algorithm can reveal essential relationships of row (and column) clusters and has better clustering precision than existing algorithms.
Under consideration for publication in Knowledge and Information Systems HICC: An Entropy Splitting Based Framework for Hierarchical Co-Clustering
, 2014
"... Abstract. Two dimensional contingency tables or co-occurrence matrices arise fre-quently in various important applications such as text analysis and web-log mining. As a fundamental research topic, co-clustering aims to generate a meaningful partition of the contingency table to reveal hidden relati ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Two dimensional contingency tables or co-occurrence matrices arise fre-quently in various important applications such as text analysis and web-log mining. As a fundamental research topic, co-clustering aims to generate a meaningful partition of the contingency table to reveal hidden relationships between rows and columns. Tra-ditional co-clustering algorithms usually produce a predefined number of
at partition of both rows and columns, which do not reveal relationship among clusters. To address this limitation, hierarchical co-clustering algorithms have attracted a lot of research interests recently. Although successful in various applications, the existing hierarchical co-clustering algorithms are usually based on certain heuristics and do not have solid theoretical background. In this paper, we present a new co-clustering algorithm, HICC, with solid theoretical background. It simultaneously constructs a hierarchical structure of both row and column clusters which retains sufficient mutual information between rows and columns of the contingency table. An efficient and effective greedy algorithm is developed which grows a co-cluster hierarchy by successively performing row-wise or column-wise splits that lead to the maximal mutual information gain. Extensive experiments on both synthetic and real datasets demonstrate that our algorithm can reveal essential relationships of row (and column) clusters and has better clustering precision than existing algorithms. Moreover, the experiments on real dataset show that HICC can effectively reveal hidden relationships between rows and columns in the contingency table.