| N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR 2000, pages 208--215, 2000. |
.... to categorical attributes which mainly lead to more succinct result (recall that STIRR needs a painful post processing step to describe the results) For instance, there are techniques employed by the machine learning community which are used to cluster documents according to terms they contain [ST00]. It is our interest to examine the properties of these methods and investigate whether it can be effectively applied to categorical as well as mixed attribute types. 7 Conclusions Clustering lies at the heart of data analysis and data mining applications. The ability to discover highly ....
Noam Slonim and Naftali Tishby. Document Clustering Using Word Clusters via the Information Bottleneck Method. In Proceedings of the 23d Annual International ACM Conference on Research and Development in Information Retrieval, (SIGIR), pages 208--215, Athens, Greece, 2428 July 2000. ACM Press.
....clustering of the data using a deterministic annealing procedure. For a hard partitional clustering algorithm using a similar information theoretic framework, see [6] These algorithms were proposed for one sided clustering. An agglomerative hard clustering version of the IB method was used in [19] to cluster documents after clustering words. The work in [8] extended the above work to repetitively cluster documents and then words. Both these papers use heuristic procedures with no guarantees on a global loss function; in contrast, in this paper we rst quantify the loss in mutual ....
....using word document co occurrence data. We show that the co clustering approach overcomes sparsity and highdimensionality yielding substantially better results than the approach of clustering such data along a single dimension. We also show better results as compared to previous algorithms in [19] and [8] The latter algorithms use a greedy technique ( 19] uses an agglomerative strategy) to cluster documents after words are clustered using the same greedy approach. For brevity we will use the following notation to denote various algorithms in consideration. We call the Information ....
[Article contains additional citation context not shown here]
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR, pages 208-215, 2000.
....is the maximum accuracy of any choice of k nodes that partition C. Note that the range of an accuracy score is between 0 and 1; the higher the accuracy score, the better. Accuracy, which has been used as a measure of performance in supervised learning, has also been used in clustering (see [38, 8, 28]) 3.1 Document term matrices These experiments involved common datasets used in prior experimentation. In all of the cases, we either do better or compare favorably with known results. 3.1.1 Reuters Many clustering algorithms are tested on the Reuters dataset [4] a corpus of 8; 654 news ....
Noam Slonim and Naftali Tishby. Document clustering using word clusters via the information bottleneck method. In Research and Development in Information Retrieval, pages 208-215, 2000.
....algorithm that directly minimizes the loss in mutual information and was found to result in higher classi cation accuracies than [1, 19] All these algorithms were proposed for one sided clustering. An agglomerative hard clustering version of the Information Bottleneck algorithm was used in [18] to cluster documents after clustering words. The work in [8] extended the above work to repetitively cluster documents and then words. However, both these papers use heuristic procedures and cluster documents and words independently using an agglomerative algorithm. In contrast, in this paper we ....
....set consists of documents randomly sampled from the respective news groups in the NG20 corpus. approach overcomes sparsity yielding substantially better results than the approach of clustering sparse data along a single dimension. We also show better results as compared to previous algorithms in [18] and [8] The latter algorithms use a greedy technique to cluster documents after words are clustered using the same greedy approach. For brevity we will use the following notation to denote various algorithms in consideration. We call the Information Bottleneck Double Clustering method in [18] as ....
[Article contains additional citation context not shown here]
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR, pages 208-215, 2000.
....categorical clustering algorithm. LIMBO builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering [7] We use it to cluster both tuples and attribute values. While the IB method has been applied before to cluster small data sets [6] , LIMBO is the first scalable hierarchical clustering algorithm to use this method. LIMBO supports a tradeoff between computation time and clustering quality. It handles large data sets efficiently and is robust over different input orders. Finally, LIMBO produces clusterings for a large range ....
N. Slonim and N. Tishby. Document Clustering Using Word Clusters via the Information Bottleneck Method. In SIGIR, Athens, Greece, 2000.
....replacing entries in the discovered biclusters with random data. Wewill show later that this approach no only produces less accurate result but also bears an inefficient performance. Also in the information retrieval area, researchers are studying document clustering based on word appearances. In [10], authors proposed a two stage method that first clusters words for each document and then cluster documents based on the word clusters of each document. Despite other differences in the problem formation, this approach significantly differs from our model on the following aspect: our model ....
....document and then cluster documents based on the word clusters of each document. Despite other differences in the problem formation, this approach significantly differs from our model on the following aspect: our model clusters the attributes (words) and objects (documents) simultaneously while [10] clusters the attributes (words) and objects (documents) sequentially. As a result, the cluster in our model consists of a subset of objects and a subset of attributes while the cluster in [10] is only a subset of objects. 3 The Model of # cluster To address the potential limitation inherit to ....
[Article contains additional citation context not shown here]
N. Slonim and N. Tishby, Document clustering using word clusters via the information bottleneck, Proc. ACM SIGIR, pp. 208-215, 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR 2000, pages 208--215, 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR 2000, pages 208--215, 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proc. of SIGIR, pages 208--215, 2000.
No context found.
N. Slonim and N. Tishby. Document Clustering Using word Clusters via the Information Bottleneck Method. In ACM SIGIR 2000.
....the problem is how to map one of the variables, considered as the source signal, X into a more concise reproduction signal X while preserving the information about the relevant (predicted) signal Y . The Information Bottleneck was found useful in various learning applications. Slonim et al. [10, 11] used it for clustering data. Note that since IB uses information theoretic point of view it does not su#er from basic flaws of geometric based algorithms as presented by Kleinberg [5] In [9] the IB was used for feature selection. Poupart et al. 6] used it while studying POMDPs, and Baram et. ....
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proc. of the 23rd Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, 2000.
....is to compress one (the source) variable, while maintaining as much information about the auxiliary (relevance) variable. This framework has proven powerful for numerous applications, such as clustering the objects of sentences with respect to the verbs [9] documents with respect to their terms [1, 5, 14], genes with respect to tissues [8, 11] and stimuli with respect to spike patterns [10] An important condition for this approach to work is that the auxiliary variable indeed corresponds to the task. In many situations, however, such pure variable is not available. The variable may in fact ....
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In P. Ingwersen N.J. Belkin and M-K. Leong, editors, Research and Development in Information Retrieval (SIGIR-00), pages 208--215. ACM press, new york, 2000.
....is to compress one (the source) variable, while maintaining as much information about the auxiliary (relevance) variable. This framework has proven powerful for numerous applications, such as clustering the objects of sentences with respect to the verbs [8] documents with respect to their terms [1, 5, 13], genes with respect to tissues [7, 10] and stimuli with respect to spike patterns [9] An important condition for this approach to work is that the auxiliary variable indeed corresponds to the task. In many situations, however, such pure variable is not available. The variable may in fact ....
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In P. Ingwersen N.J. Belkin and M-K. Leong, editors, Research and Development in Information Retrieval (SIGIR-00), pages 208--215. ACM press, new york, 2000.
No context found.
Slonim, N., & Tishby, N. (2000). Document clustering using word clusters via the information bottleneck method. Research and Development in Information Retrieval.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 208--215. ACM Press, 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 208--215. ACM Press, 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of SIGIR-23, pages 208--215, 2000.
No context found.
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press (2000) 208--215
No context found.
N. Slonim and N. Tishby, "Document clustering using word clusters via the information bottleneck method," in Proc. of the 23rd Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 208--215. ACM Press, New York, NY, USA, 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of SIGIR-23, pages 208--215, 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 208--215. ACM Press, 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR, pages 208--215, 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In SIGIR 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of the 23rd SIGIR, pp. 208--215, 2000.
No context found.
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. Proceedings of SIGIR-2000.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC