| R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semi-supervised learning. In Proceedings of NIPS-14, 2002. |
....For a hard partitional clustering algorithm using a similar information theoretic framework, see [6] These algorithms were proposed for one sided clustering. An agglomerative hard clustering version of the IB method was used in [19] to cluster documents after clustering words. The work in [8] extended the above work to repetitively cluster documents and then words. Both these papers use heuristic procedures with no guarantees on a global loss function; in contrast, in this paper we rst quantify the loss in mutual information due to co clustering and then propose an algorithm that ....
....word document co occurrence data. We show that the co clustering approach overcomes sparsity and highdimensionality yielding substantially better results than the approach of clustering such data along a single dimension. We also show better results as compared to previous algorithms in [19] and [8]. The latter algorithms use a greedy technique ( 19] uses an agglomerative strategy) to cluster documents after words are clustered using the same greedy approach. For brevity we will use the following notation to denote various algorithms in consideration. We call the Information Bottleneck ....
[Article contains additional citation context not shown here]
R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semi-supervised learning. In ECML-01, pages 121-132, 2001.
....and was found to result in higher classi cation accuracies than [1, 19] All these algorithms were proposed for one sided clustering. An agglomerative hard clustering version of the Information Bottleneck algorithm was used in [18] to cluster documents after clustering words. The work in [8] extended the above work to repetitively cluster documents and then words. However, both these papers use heuristic procedures and cluster documents and words independently using an agglomerative algorithm. In contrast, in this paper we rst quantify the loss in mutual information due to ....
....of documents randomly sampled from the respective news groups in the NG20 corpus. approach overcomes sparsity yielding substantially better results than the approach of clustering sparse data along a single dimension. We also show better results as compared to previous algorithms in [18] and [8]. The latter algorithms use a greedy technique to cluster documents after words are clustered using the same greedy approach. For brevity we will use the following notation to denote various algorithms in consideration. We call the Information Bottleneck Double Clustering method in [18] as ....
[Article contains additional citation context not shown here]
R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semi-supervised learning. In Luc De Raedt and Peter A. Flach, editors, Proceedings of ECML-01, 12th European Conference on Machine Learning, pages 121-132, Freiburg, DE, 2001. Springer Verlag, Heidelberg, DE.
....is a central problem in information retrieval. Possible applications includes use of clustering for improving retrieval [19] and for navigating and browsing large document collections [3, 6, 20] Several recent works suggest to use clustering techniques for unsupervised document classi cation [15, 5, 17]. In this task, we are given a collection of unlabeled documents and requested to nd clusters that are highly correlated with the true topics of the documents. This practical situation is especially dicult since no labeled examples are provided for the topics, hence unsupervised methods must be ....
....a collection of unlabeled documents and requested to nd clusters that are highly correlated with the true topics of the documents. This practical situation is especially dicult since no labeled examples are provided for the topics, hence unsupervised methods must be employed. The results of [15, 5] show that agglomerative clustering methods that are based on the Information Bottleneck (IB) method [18] perform well in this task. Agglomerative procedures, however, su er from two main obstacles. First, in general there is no guarantee to nd a solution which is a local maximum of the target ....
[Article contains additional citation context not shown here]
R. El-Yaniv and O. Souroujon. Iterative Double Clustering for Unsupervised and Semi-supervised Learning. In ECML 2002, to appear.
No context found.
R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semisupervised learning. In Advances in Neural Information Processing Systems (NIPS 2001.
No context found.
R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semi-supervised learning. In Proceedings of NIPS-14, 2002.
No context found.
R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semi-supervised learning. In Proceedings of NIPS-14, 2002.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC