| JONES, W. P. & FURNAS, G. W., Pictures of relevance: A geometric analysis of similarity measures, Journal of the American Society for Information Science, 38 (1987) 420-442. |
....of breast feeding. Thus, contents in the same aggregate may be characterized, related, or compared in terms of shared features in the associated external system. An interesting consequence is that a semantic content network can allow semantic approximate or proximity searching [ 11 ] 17] [25]. That is, contents that are semantically related to a given piece of content can be found by searching through network nodes that contain an aggregate to which that content belongs. For example, a search through the mammal aggregate will find contents related not only to dogs and cats but also ....
....information retrieval systems [44] For some typical methods of assigning content vectors to contents, refer to [45] In the rest of Section IV, we use the terms content and the associated content vector interchangeably. There are two major categories of similarity metrics for content vectors [25]. These are angle based metrics which use, e.g. the cosine function, and distance based metrics which use, e.g. the inner product function. For the discussion below, we assume distance based metrics. Figure 1 illustrates a set of two dimensional content vectors associated with eight pieces of ....
Jones, W., and Fumas, G., "Pictures of Relevance: A Geometric Analysis of Similarity Measures," Joumal of the American Society for Information Science, 38(6):420-442, November 1987.
....from the index language. The assumption underlying this form of filtering is that the meanings of objects and queries can be and are captured in specific words or phrases [Mar95] The three most prominent retrieval models [BC92] are the Boolean (e.g. Bla90] the vector space [Sal68, Sal71, JF87] and the probabilistic retrieval models [TC90] The first of these is based on what is called the exact match principle; the other two on the concept of best match . For surveys on these and other retrieval models see [Sal83, BC87] Boolean retrieval is based on the concept of an exact match ....
....of all currently available ACF systems are relatively similar. The differences between individual ACF systems, e.g. how the subset of users that contribute to a user s predictions (the neighborhood) is selected and how these users ratings are weighted (e.g. using the Pearson coefficient [JF87] have little influence on their dynamic filtering properties. Since no aggregation of the ratings in the user profile is provided, entire user profiles cannot be manipulated effectively, so that ACF systems provide users with no means to instantly communicate their abrupt interest changes to ....
W.P. Jones and G.W. Furnas. Pictures of relevance: a geometric analysis of similarity measures. Journal of the American Society for Information Science, 38(6):420-442, 1987.
....heuristic similarity function: 2 j len i len j i matched words j i G j i sim = Mapping. Currently, the system for fish eye visualization [Sar92] of quant based presentations is under development. We intend to compute the degree of interest DOI( function [Joh87] that guides the sizing and positioning of presented objects on the screen using parameters indicated in the user profiles and quant definition file. Agents. Currently, we support only an autonomous agent which stores information broadcast over a set of multicast channels that is of user s ....
W.P. Jones and G.W. Furnas. Pictures of relevance: a geometric analysis of similarity measures. Journal of the American Society for Information Science, Vol.38, (no.6), pp.4
....or chunks of words given by all substrings of the words in the text) Once the documents are represented by vectors in the space, a vector similarity measure is chosen to estimate the associational similarity between vectors. Quite a number of different similarity measures have been examined [74]; typical candidates are cosine or inner product. Retrieval is performed by representing the query in the vector space and then ranking the documents based on their similarity to the query, as is illustrated in Figure 2. 15 document 1 document 2 query i term j Figure 2.2: Retrieval in the ....
....in the term space (resulting in a new document matrix X) Since cosines and inner products are identical with unit length vectors, cosines are preserved as inner products by LSI. Finally, our analysis does not preclude the use of other typical similarity measures, such as Pseudo Cosine or Dice [74]. Rather, the analysis indicates that these other measures will be applied to a semantic representation which has much in common with the original space, in terms of their inner product similarity structures. 3.5 Experiments Experiments using three standard text bases from the information ....
[Article contains additional citation context not shown here]
William P. Jones and George W. Furnas. Pictures of relevance: A geometric analysis of similarity measures. Journal of the American Society for Information Science, 38(6):420--442, 1987.
....system assists the users to store, manipulate and retrieve useful data in the form of a document [8] Much research has been carried out on similarity measures and weighting schemes, and on variations of their implementations to enhance retrieval performance. Most of the similarity measures [6, 15, 12] and weighting schemes [13, 17, 11, 1] are based on the inner product and the co (e mail: yunjae cs.umn.edu) The work of this author was supported in part by the National Science Foundation grant CCR 9901992. y (e mail: hpark cs.umn.edu) The work of this author was supported in part by the ....
....of documents [5] The text retrieval conducts term matching and ranks all documents by the degree of relevance in decreasing order. The cosine measure has been one of most popular document similarity measures due to its sensitivity to document vector pattern and insensitivity to weight variation [6]. This measure is based on the inner product operation and the normalization by the document length. The insensitive to radial and large component influence gives higher similarity rating when two document vectors have similar patterns [6] In General, existing term weighting schemes assign zero ....
[Article contains additional citation context not shown here]
K. Spark Jones and G.W. Furnas. Pictures of relevance: A geometric analysis of similarity measures. Journal of the American Society for Information Science, 38(6):420--442, November 1987.
.... scheme and similarity measure, much effort has been made to enhance existing term weighting schemes [Salton and Buckley, 1988b, Jones, 1975, Barkla, 1969, Miller, 1971, Yu and Salton, 1976, Robertson, 1986, Allan, 1998] and similarity measures [Salton, 1980, Salton, 1971, Salton and McGill, 1983, Jones and Furnas, 1987] In this paper, we proposed a balanced termweighting scheme(BTWS) supporting the cosine sim1 ilarity measure to retrieve relevant documents effectively on the basis of the vector space model(VSM) The gist of the BTWS is to consider not only occurrence terms but also absence terms in finding ....
....section, similarity measures are presented, which are tightly connected to term weighting schemes. 3 Similarity Measures A number of similarity measures have been proposed, tested and evaluated because they play a crucial role in IR system [Salton, 1980, Salton, 1971, Salton and McGill, 1983, Jones and Furnas, 1987] The ranking of retrieved documents are determined by the output of the similarity measure. It is not quite an exaggeration even to say that entire performance of an IR system depends on the similarity measure. In this section, we review the similarity measures currently published in the ....
[Article contains additional citation context not shown here]
Jones, K. S. and Furnas, G. (1987). Pictures of relevance: A geometric analysis of similarity measures. Journal of the American Society for Information Science, 38(6):420--442.
....the local weighting. There are three improvements that could be done in future work. First, a different similarity measure could be explored. As stated in the Assumptions Section 3.1, we employed the inner product similarity measure. There are other types of similarity measures as represented in [3], but each of these measures uses the inner product as a basis and creates a more complicated measure from there. Second, the assumption from 3.2 could be applied: that more important content material appears at the beginning of the document. In this case, the whole document would not have to be ....
W. Jones and G. Furnas, Pictures of relevance: a geometric analysis of similarity measures., Journal of the American Society for Information Science, 38 (1987), pp. 420--442.
....as coordinates in a space of index terms. Queries and user profiles [36, 38] are also represented as points in the document space, and relevance of a document to a query is computed in terms of a proximity or association measure such as a distance metric or angular similarity measure [35]. In a 1984 review of the vector processing model, Wong and Raghavan claim not to have found explicit references to vector space representations for documents earlier than 1975 [81] But a technical report written by John Sammon in 1968 presents the details of a vector space representation for ....
....(T ij T ik ) n i = 1 T T n i = 1 2 ij n i = 1 2 ik The cosine measure calculates the cosine of the angle between two document representation vectors. Its popularity in IR must in part be attributed to the measure s sensitivity to topical or within object term relationships, [35]. Put another way, the cosine measure is most sensitive to the relative importance of the terms or topics in the document: which is the strongest, which the weakest, etc. Euclidean distance and the cosine measure often disagree as to which document pairs are most similar and least similar, since ....
[Article contains additional citation context not shown here]
Jones, W.P. and Furnas, G.W. "Pictures of relevance: a geometric analysis of similarity measures". Journal of the American Society for Information Science 38, 6 (1987), 420-443.
....is expected, that the correct translation is ranked first in the sorted list. For vector comparison, different similarity measures can be thought of. Salton McGill (1983) proposed a number of measures, such as the Cosine coefficient, the Jaccard coefficient, and the Dice coefficient (see also Jones Furnas, 1987). For the computation of related terms and synonyms, Ruge (1995) Landauer Dumais (1997) and Fung McKeown (1997) used the cosine measure, whereas Grefenstette (1994, p. 48) used a weighted Jaccardmeasure. We propose here the city block metric, which computes the similarity between two vectors ....
Jones, W.P.; Furnas, G.W. (1987). Pictures of relevance: a geometric analysis of similarity measures. Journal of the American Society for Information Science, 38(6): 420-442.
....as a similarity only in a pairwise way. In contrast, a spreading activation model represents dynamic relationship patterns among documents, among terms, and between terms and documents. From a geometric viewpoint, these two models were compared by the analogy of the iso similarity contour [7]. And, it was shown that a major difference between these two models is in the normalization method applied. In order to calculate the similarity between a query and a document, each term weight in a query or a document has to be normalized. And, this term weight normalization has the effect of ....
JONES,W.P.,AND FURNAS, G. W. Pictures of relevance: A geometric analysis of similarity measures. Journal of the American Society for Information Science 6 (1987), 420--442.
....IR for a document collection. cosine measure A measure of the similarity of two vectors d i and d j . d i Delta d j j d i j Theta j d j j For more details about similarity measures and the various normalizing denominators that can be used see the survey by Jones and Furnas [JF87] xxvi D definition link See under link types (on page xxxi) document 1. In general: written or printed matter. 2. In IR: a data object, usually textual, though it may contain other types of data such as photographs, graphs, and so on. Fra92a, p. 1] 3. In my experiment: the text that my ....
.... of a term in a document is the product of its local weight (which represents its significance to the document) and its global weight (representing its significance in the entire collection of documents) To prevent document length from affecting similarity computations the product is normalized [JF87] The two treatments (using semantic links computed using smart and lsi) are similar in most respects so that any detected differences between them should be solely the result of the different similarity measures. Cosine (Equation 4.1) is a popular normalization. Since lsi works best 48 with ....
[Article contains additional citation context not shown here]
William P. Jones and George W. Furnas. Pictures of relevance: A geometric analysis of similarity measures. Journal of the American Society for Information Science, 38(6):420 -- 442, November 1987.
....energy of all of the active nodes .We interpret this as the degree of association between the article and the user s interests. e i i e i i e i i e i i 9 A Personal News Service Based. This is a nonlinear measure, and can be contrasted with the traditional linear measures [27]. The effect of a nonlinear measure is to bias the process of article ranking towards a deep exploration of articles that are relevant to the current user interests. If there are a number of nodes with strong connections to a single feature in an article then that article will receive a high ....
....system, but our system is more ambitious in attempting to construct a long term user model. The theoretical basis of our system owes something to earlier efforts on the use of dialogue ( 23] 25] the relevant factors in ranking articles [26] and the mechanisms of differentiation of queries ([27], 28] The fundamental network spreading activation mechanisms are largely based on Jones [1] and Howells [20] network mechanisms. Our primary contributions are the concept of a user model network, and the means of constructing and using this network in practice. 8. Conclusions Our approach of a ....
Jones, W. and Furnas, G.W. "Pictures of Relevance: A Geometric Analysis of Similarity Measures" Journal of the American Society for Information Science, 38(6), pp.420-442, 1987
No context found.
JONES, W. P. & FURNAS, G. W., Pictures of relevance: A geometric analysis of similarity measures, Journal of the American Society for Information Science, 38 (1987) 420-442.
No context found.
Jones, W.P., and Furnas, G.W. Pictures of relevance: A geometric analysis of similarity measures. Journal of the American Society for Information Science, 38, 6 (1987), 420--442.
No context found.
W.P. Jones, G.W. Furnas, Pictures of relevance: a geometric analysis of similarity measures, Journal of the American Society for Information Science 38 (1987) 420 -- 442.
No context found.
Jones, W., and Furnas, G., Pictures of Relevance: A Geometric Analysis of Similarity Measures. Journal of the American Society for Information Science, 38(6): 420-442, November 1987
No context found.
Jones W. P. Furnas, G. W.: Pictures of relevance: A Geometric Analysis of similarity measures. Journal of the American society for information science, 38(6) (1987) 420-442
No context found.
W. P. Jones and G. W. Furnas, "Pictures of relevance: a geometric analysis of similarity measures," Journal of the American Society for Information Science, vol. 38, no. 6, pp. 420--442, 1987.
No context found.
Jones W. P. and Furnas, G. W.: Pictures of relevance: A Geometric Analysis of similarity measures. Journal of the American society for information science, 38(6) (1987) 420-442
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC