Download:
by Justin Zobel
Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
http://www.cs.rmit.edu.au/~jz/fulltext/sigir98.pdf
Add To MetaCart
Abstract:
Two stages in measurement of techniques for information retrieval are gathering of documents for relevance assessment and use of the assessments to numerically evaluate effectiveness. We consider both of these stages in the context of the TREC experiments, to determine whether they lead to measurements that are trustworthy and fair. Our detailed empirical investigation of the TREC results shows that the measured relative performance of systems appears to be reliable, but that recall is overestimated: it is likely that many relevant documents have not been found. We propose a new pooling strategy that can significantly increase the number of relevant documents found for given effort, without compromising fairness. 1
Citations
|
107
|
Overview of the fourth Text REtrieval Conference (TREC-4
– Harman
- 1996
|
|
105
|
The second text retrieval conference (TREC-2
– Harman
- 1995
|
|
89
|
Scientific reasoning: The Bayesian approach (2 nd ed
– Howson, Urbach
- 1993
|
|
85
|
Overview of the fifth text REtrieval conference (TREC-5), in Information Technology: The Fifth Text REtrieval Conference
– Harman, Voorhees
- 1996
|
|
59
|
A critical investigation of recall and precision as measures of retrieval system performance
– Raghavan, Jung, et al.
- 1989
|
|
59
|
The pragmatics of information retrieval experimentation, revisited
– Tague-Sutcliffe
- 1992
|
|
42
|
Variations in relevance assessments and the measurement of retrieval effectiveness
– Harter
- 1996
|
|
35
|
The state of retrieval system evaluation
– SALTON
- 1992
|
|
34
|
Relevance assessments and retrieval system evaluation
– Lesk, Salton
- 1969
|
|
31
|
A Statistical Analysis of the TREC-3 Data
– Tague-Sutcliffe, Blustein
|
|
31
|
Efficient retrieval of partial documents
– Zobel, Moffat, et al.
- 1995
|
|
23
|
Statistical inference in retrieval effectiveness evaluation
– Savoy
- 1997
|
|
6
|
Some unexplained aspects of the Cranfield tests of indexing performance factors
– Swanson
- 1971
|
|
5
|
The Cranfield II relevance assessments: A critical evaluation
– Harter
- 1971
|
|
4
|
redux: Thoughts on the STAIRS evaluation, ten years after
– STAIRS
- 1996
|
|
4
|
Relevance judgements for assessing recall
– Wallis, Thom
- 1996
|