MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  How reliable are the results of large-scale information retrieval experiments (1998) [64 citations — 2 self]

Download:
Download as a PDF
by Justin Zobel
Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
http://www.cs.rmit.edu.au/~jz/fulltext/sigir98.pdf
Add To MetaCart

Abstract:

Two stages in measurement of techniques for information retrieval are gathering of documents for relevance assessment and use of the assessments to numerically evaluate effectiveness. We consider both of these stages in the context of the TREC experiments, to determine whether they lead to measurements that are trustworthy and fair. Our detailed empirical investigation of the TREC results shows that the measured relative performance of systems appears to be reliable, but that recall is overestimated: it is likely that many relevant documents have not been found. We propose a new pooling strategy that can significantly increase the number of relevant documents found for given effort, without compromising fairness. 1

Citations

107 Overview of the fourth Text REtrieval Conference (TREC-4 – Harman - 1996
105 The second text retrieval conference (TREC-2 – Harman - 1995
89 Scientific reasoning: The Bayesian approach (2 nd ed – Howson, Urbach - 1993
85 Overview of the fifth text REtrieval conference (TREC-5), in Information Technology: The Fifth Text REtrieval Conference – Harman, Voorhees - 1996
59 A critical investigation of recall and precision as measures of retrieval system performance – Raghavan, Jung, et al. - 1989
59 The pragmatics of information retrieval experimentation, revisited – Tague-Sutcliffe - 1992
42 Variations in relevance assessments and the measurement of retrieval effectiveness – Harter - 1996
35 The state of retrieval system evaluation – SALTON - 1992
34 Relevance assessments and retrieval system evaluation – Lesk, Salton - 1969
31 A Statistical Analysis of the TREC-3 Data – Tague-Sutcliffe, Blustein
31 Efficient retrieval of partial documents – Zobel, Moffat, et al. - 1995
23 Statistical inference in retrieval effectiveness evaluation – Savoy - 1997
6 Some unexplained aspects of the Cranfield tests of indexing performance factors – Swanson - 1971
5 The Cranfield II relevance assessments: A critical evaluation – Harter - 1971
4 redux: Thoughts on the STAIRS evaluation, ten years after – STAIRS - 1996
4 Relevance judgements for assessing recall – Wallis, Thom - 1996