40 citations found. Retrieving documents...
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR 2000, pages 208--215, 2000.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Data Clustering Techniques - Qualifying Oral Examination   (Correct)

.... to categorical attributes which mainly lead to more succinct result (recall that STIRR needs a painful post processing step to describe the results) For instance, there are techniques employed by the machine learning community which are used to cluster documents according to terms they contain [ST00]. It is our interest to examine the properties of these methods and investigate whether it can be effectively applied to categorical as well as mixed attribute types. 7 Conclusions Clustering lies at the heart of data analysis and data mining applications. The ability to discover highly ....

Noam Slonim and Naftali Tishby. Document Clustering Using Word Clusters via the Information Bottleneck Method. In Proceedings of the 23d Annual International ACM Conference on Research and Development in Information Retrieval, (SIGIR), pages 208--215, Athens, Greece, 2428 July 2000. ACM Press.


Information-Theoretic Co-Clustering - Dhillon, Mallela, Modha (2003)   (9 citations)  (Correct)

....clustering of the data using a deterministic annealing procedure. For a hard partitional clustering algorithm using a similar information theoretic framework, see [6] These algorithms were proposed for one sided clustering. An agglomerative hard clustering version of the IB method was used in [19] to cluster documents after clustering words. The work in [8] extended the above work to repetitively cluster documents and then words. Both these papers use heuristic procedures with no guarantees on a global loss function; in contrast, in this paper we rst quantify the loss in mutual ....

....using word document co occurrence data. We show that the co clustering approach overcomes sparsity and highdimensionality yielding substantially better results than the approach of clustering such data along a single dimension. We also show better results as compared to previous algorithms in [19] and [8] The latter algorithms use a greedy technique ( 19] uses an agglomerative strategy) to cluster documents after words are clustered using the same greedy approach. For brevity we will use the following notation to denote various algorithms in consideration. We call the Information ....

[Article contains additional citation context not shown here]

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR, pages 208-215, 2000.


On a Recursive Spectral Algorithm for Clustering from.. - Cheng, Kannan..   (Correct)

....is the maximum accuracy of any choice of k nodes that partition C. Note that the range of an accuracy score is between 0 and 1; the higher the accuracy score, the better. Accuracy, which has been used as a measure of performance in supervised learning, has also been used in clustering (see [38, 8, 28]) 3.1 Document term matrices These experiments involved common datasets used in prior experimentation. In all of the cases, we either do better or compare favorably with known results. 3.1.1 Reuters Many clustering algorithms are tested on the Reuters dataset [4] a corpus of 8; 654 news ....

Noam Slonim and Naftali Tishby. Document clustering using word clusters via the information bottleneck method. In Research and Development in Information Retrieval, pages 208-215, 2000.


Information-Theoretic Co-Clustering - Dhillon, Mallela, Modha (2003)   (9 citations)  (Correct)

....algorithm that directly minimizes the loss in mutual information and was found to result in higher classi cation accuracies than [1, 19] All these algorithms were proposed for one sided clustering. An agglomerative hard clustering version of the Information Bottleneck algorithm was used in [18] to cluster documents after clustering words. The work in [8] extended the above work to repetitively cluster documents and then words. However, both these papers use heuristic procedures and cluster documents and words independently using an agglomerative algorithm. In contrast, in this paper we ....

....set consists of documents randomly sampled from the respective news groups in the NG20 corpus. approach overcomes sparsity yielding substantially better results than the approach of clustering sparse data along a single dimension. We also show better results as compared to previous algorithms in [18] and [8] The latter algorithms use a greedy technique to cluster documents after words are clustered using the same greedy approach. For brevity we will use the following notation to denote various algorithms in consideration. We call the Information Bottleneck Double Clustering method in [18] as ....

[Article contains additional citation context not shown here]

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR, pages 208-215, 2000.


Using Categorical Clustering in Schema Discovery - Andritsos, Miller   (Correct)

....categorical clustering algorithm. LIMBO builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering [7] We use it to cluster both tuples and attribute values. While the IB method has been applied before to cluster small data sets [6] , LIMBO is the first scalable hierarchical clustering algorithm to use this method. LIMBO supports a tradeoff between computation time and clustering quality. It handles large data sets efficiently and is robust over different input orders. Finally, LIMBO produces clusterings for a large range ....

N. Slonim and N. Tishby. Document Clustering Using Word Clusters via the Information Bottleneck Method. In SIGIR, Athens, Greece, 2000.


delta-Clusters: Capturing Subspace Correlation in a Large.. - Yang, Wang, Wang, Yu (2002)   (3 citations)  (Correct)

....replacing entries in the discovered biclusters with random data. Wewill show later that this approach no only produces less accurate result but also bears an inefficient performance. Also in the information retrieval area, researchers are studying document clustering based on word appearances. In [10], authors proposed a two stage method that first clusters words for each document and then cluster documents based on the word clusters of each document. Despite other differences in the problem formation, this approach significantly differs from our model on the following aspect: our model ....

....document and then cluster documents based on the word clusters of each document. Despite other differences in the problem formation, this approach significantly differs from our model on the following aspect: our model clusters the attributes (words) and objects (documents) simultaneously while [10] clusters the attributes (words) and objects (documents) sequentially. As a result, the cluster in our model consists of a subset of objects and a subset of attributes while the cluster in [10] is only a subset of objects. 3 The Model of # cluster To address the potential limitation inherit to ....

[Article contains additional citation context not shown here]

N. Slonim and N. Tishby, Document clustering using word clusters via the information bottleneck, Proc. ACM SIGIR, pp. 208-215, 2000.


Journal of Machine Learning Research 3 (2003) 1307-1331.. - Amir Globerson Gamir   Self-citation (Tishby)   (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR 2000, pages 208--215, 2000.


Sufficient Dimensionality Reduction - Globerson, Tishby (2003)   Self-citation (Tishby)   (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR 2000, pages 208--215, 2000.


Extracting Relevant Structures with - Chechik, Tishby   Self-citation (Tishby)   (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proc. of SIGIR, pages 208--215, 2000.


Discriminative Feature Selection via Multiclass Variable.. - Slonim, Bejerano (2002)   (1 citation)  Self-citation (Slonim Tishby)   (Correct)

No context found.

N. Slonim and N. Tishby. Document Clustering Using word Clusters via the Information Bottleneck Method. In ACM SIGIR 2000.


An Information Theoretic Tradeoff between Complexity and .. - Gilad-Bachrach, Navot..   Self-citation (Tishby)   (Correct)

....the problem is how to map one of the variables, considered as the source signal, X into a more concise reproduction signal X while preserving the information about the relevant (predicted) signal Y . The Information Bottleneck was found useful in various learning applications. Slonim et al. [10, 11] used it for clustering data. Note that since IB uses information theoretic point of view it does not su#er from basic flaws of geometric based algorithms as presented by Kleinberg [5] In [9] the IB was used for feature selection. Poupart et al. 6] used it while studying POMDPs, and Baram et. ....

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proc. of the 23rd Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, 2000.


Extracting Relevant Structures with Side Information - Chechik, Tishby (2002)   (6 citations)  Self-citation (Tishby)   (Correct)

....is to compress one (the source) variable, while maintaining as much information about the auxiliary (relevance) variable. This framework has proven powerful for numerous applications, such as clustering the objects of sentences with respect to the verbs [9] documents with respect to their terms [1, 5, 14], genes with respect to tissues [8, 11] and stimuli with respect to spike patterns [10] An important condition for this approach to work is that the auxiliary variable indeed corresponds to the task. In many situations, however, such pure variable is not available. The variable may in fact ....

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In P. Ingwersen N.J. Belkin and M-K. Leong, editors, Research and Development in Information Retrieval (SIGIR-00), pages 208--215. ACM press, new york, 2000.


Extracting Relevant Structures with Side Information - Chechik, Tishby (2003)   (6 citations)  Self-citation (Tishby)   (Correct)

....is to compress one (the source) variable, while maintaining as much information about the auxiliary (relevance) variable. This framework has proven powerful for numerous applications, such as clustering the objects of sentences with respect to the verbs [8] documents with respect to their terms [1, 5, 13], genes with respect to tissues [7, 10] and stimuli with respect to spike patterns [9] An important condition for this approach to work is that the auxiliary variable indeed corresponds to the task. In many situations, however, such pure variable is not available. The variable may in fact ....

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In P. Ingwersen N.J. Belkin and M-K. Leong, editors, Research and Development in Information Retrieval (SIGIR-00), pages 208--215. ACM press, new york, 2000.


Unknown -   (Correct)

No context found.

Slonim, N., & Tishby, N. (2000). Document clustering using word clusters via the information bottleneck method. Research and Development in Information Retrieval.


Generation of Attribute Value Taxonomies from Data for.. - Construction Of Accurate   (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 208--215. ACM Press, 2000.


Generation of Attribute Value Taxonomies from Data for.. - Construction Of Accurate   (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 208--215. ACM Press, 2000.


Disambiguating Web Appearances - Of People In (2005)   (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of SIGIR-23, pages 208--215, 2000.


Generation of Attribute Value Taxonomies from Data - And Their Use   (Correct)

No context found.

Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press (2000) 208--215


Non-Parametric Dependent Components - Klami, Kaski (2005)   (Correct)

No context found.

N. Slonim and N. Tishby, "Document clustering using word clusters via the information bottleneck method," in Proc. of the 23rd Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 208--215. ACM Press, New York, NY, USA, 2000.


Disambiguating Web Appearances of People in a Social Network - Bekkerman, McCallum (2005)   (1 citation)  (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of SIGIR-23, pages 208--215, 2000.


Generation of Attribute Value Taxonomies from Data.. - Kang, Silvescu.. (2004)   (1 citation)  (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 208--215. ACM Press, 2000.


Information Theoretic Clustering of Sparse Co-Occurrence Data - Inderjit Dhillon And (2003)   (1 citation)  (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR, pages 208--215, 2000.


Combining Statistics and Semantics for Word and.. - Termier, Rousset..   (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In SIGIR 2000.


Iterative Residual Rescaling: An Analysis and Generalization of.. - Ando, Lee (2001)   (1 citation)  (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In Proceedings of the 23rd SIGIR, pp. 208--215, 2000.


Spectral Relaxation Models And Structure Analysis For K-Way.. - Gu Zha Ding (2001)   (7 citations)  (Correct)

No context found.

N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. Proceedings of SIGIR-2000.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC