MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Exploring the similarity space (1998) [55 citations — 6 self]

Download:
Download as a PDF
by Justin Zobel, Alistair Moffat
SIGIR FORUM
http://www.cs.rmit.edu.au/~jz/fulltext/sigirforum98.pdf
Add To MetaCart

Abstract:

Abstract Ranked queries are used to locate relevant documents in text databases. In a ranked query a list of terms is specified, then the documents that most closely match the query are returned—in decreasing order of similarity—as answers. Crucial to the efficacy of ranked querying is the use of a similarity heuristic, a mechanism that assigns a numeric score indicating how closely a document and the query match. In this note we explore and categorise a range of similarity heuristics described in the literature. We have implemented all of these measures in a structured way, and have carried out retrieval experiments with a substantial subset of these measures. Our purpose with this work is threefold: first, in enumerating the various measures in an orthogonal framework we make it straightforward for other researchers to describe and discuss similarity measures; second, by experimenting with a wide range of the measures, we hope to observe which features yield good retrieval behaviour in a variety of retrieval environments; and third, by describing our results so far, to gather feedback on the issues we have uncovered. We demonstrate that it is surprisingly difficult to identify which techniques work best, and comment on the experimental methodology required to support any claims as to the superiority of one method over another. 1

Citations

2217 J.: Introduction to Modern Information Retrieval – Salton, Macgill - 1983
957 Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer – Salton
881 Term-Weighting Approaches in Automatic Text Retrieval – Salton, Buckley - 1988
563 Managing Gigabytes: compressing and indexing documents and images – Witten, Moffat, et al. - 1994
364 Information Retrieval: Data Structures and Algorithms – Frakes, Baeza-Yates - 1992
245 Pivoted document length normalization – Singhal, Buckley, et al. - 1996
148 Information storage and retrieval – Korfhage - 1997
105 The second text retrieval conference (TREC-2 – Harman - 1995
20 Large test collection experiments on an operational, interactive system: Okapi at TREC – Robertson, Walker, et al. - 1995
18 The MG retrieval system: compressing for space and speed – Bell, Moffat, et al. - 1995
16 Compression and fast indexing for multi-gigabyte text databases – Moffat, Zobel - 1994
2 MG public domain software for indexing and retrieving text, including tools for compressing text, bilevel images, grayscale images, and textual images. Available from ftp://munnari.oz.au/pub/mg and from http://www.mds.rmit.edu.au/mg – MG-software - 1995
1 Research Methods: A Process of Enquiry – Graziano, Raulin - 1993