| N. Jardine & C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, vol. 7, pages 217--240, 1971. |
....evaluation methodology, machine learning, document representation. 1. INTRODUCTION Document clustering was initially proposed for improving the precision and recall of information retrieval systems [18] Because clustering is often too slow for large corpora and has indifferent performance [8], document clustering has been used more recently in document browsing [3] to improve the organization Pernfission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or ....
Jardine, N. and van Rijsbergen, C. J. 1971. The use of hierarchical clustering in information retrieval. Information Storage and Retreival, 7:217440.
....more intuitive and easier to interpret than previous evaluation measures. 1 Introduction Document clustering was initially proposed for improving the precision and recall of information retrieval systems [14] Because clustering is often too slow for large corpora and has indifferent performance [7], document clustering has been used more recently in document browsing [3] to improve the organization and viewing of retrieval results [5] to accelerate nearest neighbor search [T] and to generate Yahoolike hierarchies [T0] In this paper, we propose a clustering algorithm, CBC (Clustering By ....
Jardine, N. and van Rijsbergen, C. J. 1971. The use of hierarchical clustering in information retrieval. Information Storage and Retreival, 7:217240.
....categories or a given list of topics. The Cluster Abstraction Model (CAM) is a statistical latent class or mixture model [ McLachlan and Basford, 1988 ] which organizes groups of documents in a hierarchy. Compared to most state of the art techniques based on agglomerative clustering (e.g. Jardine and van Rijsbergen, 1971; Croft, 1977; Willett, 1988 ] it has several advantages and additional features: As a probabilistic model the most important advantages are: ffl a sound foundation in statistics and probabilistic inference ffl a principled evaluation of generalization performance for model selection, ffl ....
N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7:217--240, 1971.
.... in one or a small number of collections, according to van Rijsbergen s cluster hypothesis that closely associated documents should be relevant to the same queries [14] In the tradition of early work on clustering documents and evaluating queries against clusters rather than single documents [7][9], Xu and Croft [19] have shown that for TREC queries, topical organization by global clustering does in fact concentrate most relevant documents into a small number of collections. Xu and Croft divided TREC collections into 100 subcollections either by source or by topic using clustering. They ....
Jardine, N. and van Rijsbergen, C.J. The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7, pages 217-240, 1971.
.... allowing a match between a document and a query (via the cluster representative) even when individual query terms might be missing from the document (but are present in other documents in the cluster) Extensive studies on document clustering have given results that are negative to mixed at best [9, 22, 6, 27, 30]. Work on Latent Semantic Indexing (LSI) also has a similar feel. LSI allows a match between queries and documents that might not share any terms in word space but do share some concepts in the LSI space [4] Even though enhanced document representations (what we are calling document expansion) ....
N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.
....tend to be more closely related to each other than to non relevant documents. E#orts have been made to determine whether the cluster hypothesis is valid. Voorhees [Voo85] discusses a way of evaluating whether the cluster hypothesis holds and shows negative results. Jardine and van Rijsbergen [JR71] show some evidence that search results could be improved by clustering. Hearst and Pedersen [HP96] re examine the cluster hypothesis by focusing on the Scatter Gather system [CKP93] and conclude that it holds for browsing tasks. Systems like Scatter Gather [CKP93] provide a mechanism for ....
N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval, Information Storage and Retrieval, 7:217-240, 1971.
....Later, dendograms were used as is based on the assumption that associations between documents convey 1 See [El Hamdouchi and Willett 86] for the application of Ward s method to document clustering. 6 Figure 1: An example of dendogram information about the relevance of documents to requests [Jardine and van Rijsbergen 1971]. It is this latter use of dendograms that is of interest in this work. 3 Improving Effectiveness of Ephemeral Clustering Recall that a key requirement for ephemeral clustering is precision. Unlike persistent clustering, whereby quality control can be applied to the result (e.g. by manually ....
Jardine, N. and van Rijsbergen, C., The use of hierarchical clustering in information retrieval. Information Storage and Retrieval 7(5), 1971, 217--240.
....had already been determined to be optimal for the clustering, and only when emphasizing precision over recall in the evaluation measure. Some of van Rijsbergen s experiments suggested that if the optimal cluster was chosen then results could be improved, although actual performance was inferior [13]. However, these studies were done on a subset of the Cranfield collection. In Willett s extensive survey article on document clustering [26] he finds problems with these and related studies, because they concentrated on the use of the (small) Cranfield collection and typically employed an ....
N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7:217--240, 1971.
....good overview of using clustering in Information Retrieval (IR) see [Wil88] The use of clustering in IR was mostly driven by the cluster hypothesis [Rij79] which states that relevant documents tend to be more closely related to each other than to non relevant documents. Jardine and van Rijsbergen [JR71] show some evidence that search results could be improved by clustering. Hearst and Pedersen [HP96] re examine the cluster hypothesis by focusing on the Scatter Gather system [CKP93] and conclude that it holds for browsing tasks. # Research partially supported by ONR contract N00014 95 1 1204, ....
N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval, 7:217-240, 1971.
....holds and shows negative results. Croft [Cro80] describes a method for bottom up cluster search that could be shown to outperform a full ranking system for the Cranfield collection. The single link method [Cro77] does not provide any guarantees for the topic similarity within a cluster. In [JR71] Jardine and van Rijsbergen show some evidence that search results could be improved by clustering. Hearst and Pedersen [HP96] re examine the cluster hypothesis by focusing on the Scatter Gather system [CKP93] and conclude that it holds for browsing tasks. Systems like Scatter Gather [CKP93] ....
N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval, 7:217-240, 1971.
.... allowing a match between a document and a query (via the cluster representative) even when individual query terms might be missing from the document (but are present in other documents in the cluster) Extensive studies on document clustering have given results that are negative to mixed at best [12, 27, 8, 32, 36]. Work on Latent Semantic Indexing (LSI) also has a similar feel. LSI allows a match between queries and documents that might not share any terms in word space but do share some concepts in the LSI space [5] Even though enhanced document representations (what we are calling document expansion) ....
N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.
....does it rely on predefined document categories or a given list of topics. The presented cluster abstraction model (CAM) is a statistical mixture model [6, 5] which organizes groups of documents in a hierarchy. Compared to most state of the art techniques based on agglomerative clustering (e.g. [4, 1, 9]) is has several advantages and additional features As a generative model the most important advantages are: i) a sound foundation on the likelihood principle (likelihood as a global clustering criterion) ii) the probabilistic inference mechanism, iii) evaluation of generalization performance ....
N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7:217--240, 1971.
....negative results. Croft [Cro80] describes a method for bottom up cluster search that could be shown to outperform a full ranking system for the Cranfield collection. The single link method [Cro77] does not provide any guarantees for the topic similarity within a cluster. Jardine and van Rijsbergen [JR71] show some evidence that search results could be improved by clustering. Hearst and Pedersen [HP96] re examine the cluster hypothesis by focusing on the Scatter Gather system [CKP93] and conclude that it holds for browsing tasks. Systems like Scatter Gather [CKP93] provide a mechanism for ....
N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval, Information Storage and Retrieval, 7:217-240, 1971.
....search techniques are typically compared to direct near neighbor search [9] and are evaluated in terms of precision and recall. Various studies indicate that cluster search strategies are not markedly superior to near neighbor search, and, in some situations, can be inferior (see, for example, [6, 12, 4]) Furthermore, document clustering algorithms are often slow, with quadratic running times. It is therefore unsurprising that cluster search, with its indifferent performance, has not gained wide popularity. Document clustering has also been studied as a method for accelerating near neighbor ....
N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7:217--240, 1971.
No context found.
N. Jardine & C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, vol. 7, pages 217--240, 1971.
No context found.
Jardine, N., & Van Rijsbergen, C. J. (1971). The use of hierarchical clustering in information retrieval, Information Storage and Retrieval, 7, 217--240.
No context found.
N. Jardine and C.J. van Rijsbergen. The Use of Hierarchical Clustering in Information Retrieval. Information Storage and Retrieval, 7 (1971), pp. 217--240.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC