| E. Rasmussen. Clustering algorithms. In W.B. Frakes and R. Baeza-Yates, eds. Information Retrieval . Prentice Hall, 1992. |
....over 10,000 different words. As an example, the four sets of top 500 documents returned by the queries mafia, Clinton, cancer and computer use a vocabulary of between 14,000 and 17,000 different candidate keywords. Document clustering has attracted much interest in the recent decades, eg [20, 8, 29, 17], and much is known about the importance of feature reduction in general, eg [14] and, in particular, clustering [27] However, little has been done so far to facilitate feature reduction for document clustering of query results, with the notable exception of [22] In contrast to the latter paper, ....
E Rasmussen. Clustering algorithms. In W Frakes and R Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms. Prentice Hall, 1992.
....discussed in the literature. Important related work has been done in information retrieval, document clustering, conceptual clustering, and recently in text mining. In information retrieval and document clustering, the weighted keyword representation of documents is one of the most widely used [8, 9]. For this type of document representation, many different similarity measures are proposed, for instance, the Dice coefficient, the Jaccard coefficient, the Cosine coefficient [8] etc. For the representation with binary term weights, the Dice coefficient is calculated as follows: Dice ....
.... retrieval and document clustering, the weighted keyword representation of documents is one of the most widely used [8, 9] For this type of document representation, many different similarity measures are proposed, for instance, the Dice coefficient, the Jaccard coefficient, the Cosine coefficient [8], etc. For the representation with binary term weights, the Dice coefficient is calculated as follows: Dice coefficient: 2 1 2 1 2 2 1 D n D n D D n S D D = where ( i D n is the number of terms in D i , and ( j i D D n is the number of terms that the two documents D i and D ....
[Article contains additional citation context not shown here]
Rasmussen, Edie (1992). "Clustering Algorithms". Information Retrieval: Data Structures & Algorithms. William B. Frakes and Ricardo Baeza-Yates (Eds.), Prentice Hall, 1992.
....their relational similarity s r . The conceptual similarity s c depends on the common concepts of G 1 and G 2 . It indicates how similar the entities, actions, and attributes mentioned in both conceptual graphs are. We calculate it using an expression analogous to the well known Dice coefficient [8]: 2 1 2 1 , 2 G c G c O c G G c c weight c weight c c c weight s Here U O is the union of all graphs in O, i.e. the set of all their nodes and arcs; the function weight (c) gives the relative importance of the concept c, and the function b ( c G 1 p , c G 2 p ) expresses the ....
Rasmussen, Edie (1992). "Clustering Algorithms". Information Retrieval: Data Structures & Algorithms. William B. Frakes and Ricardo Baeza-Yates (Eds.), Prentice Hall, 1992.
....variable. Clustering a large portion of the entire World Wide Web (such as in www.yahoo.com) is a much di erent proposition than clustering the smaller document collection returned by a web search engine (www.northernlight.com) Existing document clustering methods include agglomerative clustering[40, 39, 29], the partitional k means algorithm[8] projection based methods including LSA[2, 33] selforganizing maps[25, 21] and multidimensional scaling[27, 22] For computational eciency required in on line clustering, hybrid approaches have been considered in[7, 19] Recently there has been a urry of ....
E. Rasmussen. Clustering algorithms. In William B. Frakes and Ricardo Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms, pages 419-442. Prentice Hall, Englewood Clis, New Jersey, 1992.
.... of documents [21] An e ective measure of similarity between documents, and one that is often used in information retrieval, is cosine similarity, which uses the cosine of the angle between document vectors [17] The k means algorithm can be adapted to use the cosine similarity metric, see [16], to yield the spherical k means algorithm, so named because the algorithm operates on vectors that lie on the unit sphere [4] Since it uses cosine similarity, spherical k means exploits the sparsity of document vectors and is highly ecient [3] The size of desired clusters is an important ....
E. Rasmussen. Clustering algorithms. In William B. Frakes and Ricardo Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms, pages 419-442. Prentice Hall, Englewood Clis, New Jersey, 1992.
....is applied to organize similar documents into groups. Then, each cluster is digested and named by a set of words and a sample document. 3. 1 Document clustering Cluster analysis has long been applied in information systems to improve the efficiency and effectiveness of retrieval for quite a time [5]. The basic principle behind clustering is to select objects that have high similarities into the same cluster. Various clustering algorithms differ in their use of the similarity measures between groups. For document clustering, the measure of association between documents is determined by the ....
....= the largest tf i;t in document d i , df k = the document frequency of term t in our collection G, where documents are sampled from the Web, and N = the number of documents in G. The clustering algorithm we adopt here is a variation of the hierarchical agglomerative clustering methods (HACM) [5]. The general algorithm for the HACM is derived by identifying the two closest clusters and combining them in one. The commonly used HACM include single link, complete link, and group average link, which differ in their measures of the intercluster similarity. To get an intermediate structure of ....
E. Rasmussen. Clustering algorithms. In W. Frakes and Baeza-Yates R., editors, Information Retrieval: Data Structures and Algorithms, chapter 16. PrenticeHall, 1992.
....documents based on a novel probabilistic similarity measure that captures the expected overlap in words between documents. This score prompts the investigation of different methods for estimating the probability of a word appearing in a document. Furthermore, we show that the cosine coefficient [131] widely used in information retrieval as a measure of document similarity can be associated with a particular form of probability estimation in our model. We also introduce a specific scoring function that outperforms the cosine coefficient and its extensions in our experiments with document ....
....retrieval, but it could also help retrieve relevant documents that did not contain the precise terms used in the user s query, but were part of a cluster of other documents that did. Subsequently, a good deal of research in this area has focused on different methods for forming document clusters [131]. Willett [172] gives an excellent survey of much of this work up to the present decade. The past few years have seen new directions in document clustering focusing on more diverse issues such as models that allow documents to be full members of multiple clusters [141] as well as experimental ....
[Article contains additional citation context not shown here]
Rasmussen, E. Clustering algorithms. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice Hall, 1992, pp. 419--442.
....discussed in detail below, previous results provided constant approximation guarantees, or were limited to xed dimension. Clustering of data has signi cant importance in many elds, including operations research, data mining, statistics, computer vision and pattern recognition (see, for example, [37, 10, 38] and references therein) The recent interest in clustering problems can be attributed to applications such as the classi cation of web pages retrieved by a search engine [40, 33, 30] or the study of gene expression in computational molecular biology [31] In many applications, the goal is to ....
E. Rasmussen. Clustering algorithms. In W.B. Frakes and R. Baeza-Yates, eds. Information Retrieval . Prentice Hall, 1992.
....of the matrix entries being zero. Using the vector space model various classical clustering algorithms such as the k means algorithm and its variants, hierarchical agglomerative clustering, and graph theoretic methods have been explored in the text mining literature; for detailed reviews, see [Ras92, Wil88] Recently, there has been a urry of activity in this area, see [BGG 98, CKPT92, SS97, ZE98] A substantial amount of this work has concentrated on clustering web search results where the document collections to be clustered are not very large. In this paper, our main concern is in ....
E. Rasmussen. Clustering algorithms. In William B. Frakes and Ricardo Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms, pages 419-442. Prentice Hall, Englewood Clis, New Jersey, 1992.
....variable. Clustering a large portion of the entire World Wide Web (such as in www.yahoo.com) is a much di erent proposition than clustering the smaller document collection returned by a web search engine (www.northernlight.com) Existing document clustering methods include agglomerative clustering[40, 39, 29], the partitional k means algorithm[8] projection based methods including LSA[2, 33] selforganizing maps[25, 21] and multidimensional scaling[27, 22] For computational eciency required in on line clustering, hybrid approaches have been considered in[7, 19] Recently there has been a urry of ....
E. Rasmussen. Clustering algorithms. In William B. Frakes and Ricardo Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms, pages 419-442. Prentice Hall, Englewood Clis, New Jersey, 1992.
....to compare histograms. For extracting M keyframes, the material is divided into M clusters. This approach sidesteps the problem of selecting an appropriate threshold for the cluster size. Frames are clustered using the complete link method of the hierarchical agglomerative clustering technique [6]. This method uses the maximum of the pair wise distances between the frames in two clusters to determine the intercluster similarity. Other clustering methods can be used as well with slightly different results. Figure 3 shows that the hierarchical clustering is performed by combining the two ....
E. Rasmussen, "Clustering algorithms", in Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates (Eds.), Prentice Hall, 1992, pp. 419-442.
....dimensional space which is composed of the features drawn from the document itself. The basic assumption of this approach is that words related to the same topic are semantically close to each other. This assumption is strong, but is at the basis of much of the work on document and term clustering [27]. More formally, given a set of n categories and a document, the task is to split the document into smaller segments and map them into the corresponding categories. The proposed approach consists of a two step process for passage classification. The first step takes the Na ive Bayes learning ....
E. Rasmussen. Clustering algorithms. In W.B. Frakes and R. Baeza-Yates, editors, Information Retrieval: data structures and algorithms., chapter 16. Prentice Hall, Englewood Cliffs, New Jersey, USA, 1992.
.... and navigating collections of documents in [Dhillon et al. 1998] Various classical clustering algorithms such as the k means algorithm and its variants, hierarchical agglomerative clustering, and graph theoretic methods have been explored in the text mining literature; for detailed reviews, see [Rasmussen, 1992, Willet, 1988] Recently, there has been a flurry of activity in this area, see [Boley et al. 1998, Cutting et al. 1992, Hearst and Pedersen, 1996, Sahami et al. 1999, Schutze and Silverstein, 1997, Silverstein and Pedersen, 1997, Vaithyanathan and Dom, 1999, Zamir and Etzioni, 1998] A ....
.... 1996] It is natural to measure similarity between such vectors by their inner product, known as cosine similarity [Salton and McGill, 1983] In this paper, we will use a variant of the well known Euclidean k means algorithm [Duda and Hart, 1973, 1 Hartigan, 1975] that uses cosine similarity [Rasmussen, 1992]. We shall show that this algorithm partitions the high dimensional unit sphere using a collection of great hypercircles, and hence we shall refer to this algorithm as the spherical k means algorithm. The algorithm computes a disjoint partitioning of the document vectors, and, for each partition, ....
Rasmussen, E. (1992). Clustering algorithms. In Frakes, W. B. and Baeza-Yates, R., editors, Information Retrieval: Data Structures and Algorithms, pages 419--442. Prentice Hall, Englewood Cliffs, New Jersey.
No context found.
E. Rasmussen. Clustering algorithms. In W.B. Frakes and R. Baeza-Yates, eds. Information Retrieval . Prentice Hall, 1992.
No context found.
Rasmussen E. Clustering algorithms. In: Frakes WB, BaezaYates R, editors. Information retrieval : data structures & algorithms. Englewood Cliffs, N.J.: Prentice Hall; 1992. p. 419-442. O. Bodenreider et al. 331
No context found.
Rasmussen E. Clustering algorithms. In: Frakes WB, Baeza-Yates R, editors. Information retrieval : data structures & algorithms. Englewood Cliffs, N.J.: Prentice Hall; 1992. p. 419-442
No context found.
Rasmussen, E. (1992). Clustering Algorithms, in [24], Chapter 16, pp. 419--442.
No context found.
Rasmussen E. Clustering algorithms. In: Frakes WB, Baeza-Yates R, editors. Information retrieval : data structures & algorithms. Englewood Cliffs, N.J.: Prentice Hall; 1992. p. 419-442
No context found.
E. Rasmussen. Clustering algorithms. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval Data Structures and Algorithms. Prentice Hall, N.J., 1992.
No context found.
Rasmussen, E.: Clustering algorithms. In Frakes, W., Baeza-Yates, R., eds.: Information Retrieval: Data structures & Algorithms, Prentice Hall (1992)
No context found.
E. Rasmussen. Clustering algorithms. In W.B. Frakes, R. Baeza-Yates, eds., Information Retrieval. Prentice Hall, 1992.
No context found.
E. Rasmussen. Clustering algorithms. In W.B. Frakes, R. Baeza-Yates, eds., Information Retrieval. Prentice Hall, 1992.
No context found.
Rasmussen, E. Clustering Algorithms. In W. B. Frakes and R. Baeza-Yates (eds.) Information Retrieval Data Structures and Algorithms, Prentice Hall, N. J., 1992.
No context found.
E. Rasmussen, "Clustering Algorithms," Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates, eds., Prentice Hall, Upper Saddle River, N.J., 1992, pp. 419-442.
No context found.
Rasmussen, E. 1992. Clustering Algorithms. In W.B. Frakes and R. Baeza-Yates (eds.), Information Retrieval: Data Structures and Algorithms, 419--441. Upper Saddle River, NJ: Prentice-Hall.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC