Results 1 - 10
of
15
Cumulated Gain-based Evaluation of IR Techniques
- ACM Transactions on Information Systems
, 2002
"... Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, i ..."
Abstract
-
Cited by 233 (3 self)
- Add to MetaCart
Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, i.e., recall and precision based on binary relevance assessments, to graded relevance assessments. Alternatively, novel measures based on graded relevance assessments may be developed. This paper proposes three novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor on the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-tothe -ideal performance of IR techniques, based on the cumulative gain they are able to yield. The novel measures are defined and discussed and then their use is demonstrated in a case study using TREC data - sample system run results for 20 queries in TREC-7. As relevance base we used novel graded relevance assessments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, e.g., from the user point of ...
IR evaluation methods for retrieving highly relevant documents
, 2000
"... This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable from the user point of view in moderu large IR e ..."
Abstract
-
Cited by 218 (4 self)
- Add to MetaCart
This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable from the user point of view in moderu large IR environments. The proposed methods are (1) a novel application of P-R curves and average precision computations based on separate recall bases for documents of different degrees of relevance, and (2) two novel measures computing the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. We then demonstrate the use of these evaluation methods in a case study on the effectiveness of query types, based on combinations of query structures and expansion, in retrieving documents of various degrees of relevance. The test was run with a best match retrieval system (In- Query ) in a text database consisting of newspaper articles. The results indicate that the tested strong query structures are most effective in retrieving highly relevant documents. The differences between the query types are practically essential and statistically significant. More generally, the novel evaluation methods and the case demonstrate that non-dichotomous rele- vance assessments are applicable in IR experiments, may reveal interesting phenomena, and allow harder testing of IR methods. 1.
Experimental components for the evaluation of interactive information retrieval systems
- Journal of Documentation
, 2000
"... 1988, no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying or otherwise without the prior written permission of the publisher. ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
1988, no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying or otherwise without the prior written permission of the publisher.
Evaluation by Highly Relevant Documents
- In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2001
"... Given the size of the web, the search engine industry has argued that engines should be evaluated by their ability to retrieve highly relevant pages rather than all possible relevant pages. To explore the role highly relevant documents play in retrieval system evaluation, assessors for the TREC-9 we ..."
Abstract
-
Cited by 62 (3 self)
- Add to MetaCart
Given the size of the web, the search engine industry has argued that engines should be evaluated by their ability to retrieve highly relevant pages rather than all possible relevant pages. To explore the role highly relevant documents play in retrieval system evaluation, assessors for the TREC-9 web track used a three-point relevance scale and also selected best pages for each topic. The relative eectiveness of runs evaluated by dierent relevant document sets differed, con rming the hypothesis that dierent retrieval techniques work better for retrieving highly relevant documents. Yet evaluating by highly relevant documents can be unstable since there are relatively few highly relevant documents. TREC assessors frequently disagreed in their selection of the best page, and subsequent evaluation by best page across dierent assessors varied widely. The discounted cumulative gain measure introduced by Jarvelin and Kekalainen increases evaluation stability by incorporating all relevance judgments while still giving precedence to highly relevant documents.
Interactive evaluation of the Ostensive Model using a new test collection of images with multiple relevance assessments
- Journal of Information Retrieval
, 2000
"... The Ostensive Model proposes a manner of structuring the uncertainty associated with individual relevance judgements as sources of evidence in relevance feedback. It proposes temporal profiles of uncertainty, motivating the application of a particular class of discount function with respect to the a ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
The Ostensive Model proposes a manner of structuring the uncertainty associated with individual relevance judgements as sources of evidence in relevance feedback. It proposes temporal profiles of uncertainty, motivating the application of a particular class of discount function with respect to the age of the evidence. This paper presents an initial evaluation of the relative effectiveness of different uncertainty discount functions. A novel direct manipulation interface to a multimedia retrieval system embodying the Ostensive Model is outlined briefly. The paper describes the construction and characteristics of a new image test collection utilising multiple binary relevance assessments. The use of such multiple assessments and multiple interpretations of them are discussed. The evaluation environment is detailed in terms of the interface, test collection, and tasks set to users. Multiple interpretations of the results, and the statistical significance of comparisons are presented. Th...
A Probability Ranking Principle for Interactive Information Retrieval
, 2008
"... The classical Probability Ranking Principle (PRP) forms the theoretical basis for probabilistic Information Retrieval (IR) models, which are dominating IR theory since about 20 years. However, the assumptions underlying the PRP often do not hold, and its view is too narrow for interactive informatio ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The classical Probability Ranking Principle (PRP) forms the theoretical basis for probabilistic Information Retrieval (IR) models, which are dominating IR theory since about 20 years. However, the assumptions underlying the PRP often do not hold, and its view is too narrow for interactive information retrieval (IIR). In this paper, a new theoretical framework for interactive retrieval is proposed: The basic idea is that during IIR, a user moves between situations. In each situation, the system presents to the user a list of choices, about which s/he has to decide, and the first positive decision moves the user to a new situation. Each choice is associated with a number of cost and probability parameters. Based on these parameters, an optimum ordering of the choices can the derived- the PRP for IIR. The relationship of this rule to the classical PRP is described, and issues of further research are pointed out. 1
Evaluating Information Retrieval Systems Under The Challenges Of Interaction And Multidimensional Dynamic Relevance
- in Proceedings of the CoLIS 4 Conference
, 2002
"... The Laboratory Model of information retrieval (IR) evaluation has been challenged by pro- gress in research related to relevance and information seeking as well as by the growing need for accounting for interaction in evaluation. Real human users introduce non-binary, subjec- tive and dynamic releva ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The Laboratory Model of information retrieval (IR) evaluation has been challenged by pro- gress in research related to relevance and information seeking as well as by the growing need for accounting for interaction in evaluation. Real human users introduce non-binary, subjec- tive and dynamic relevance judgments into IR processes and affect these processes. Therefore the traditional evaluation based on the Laboratory Model is challenged for its (lack of) real- ism. This paper examines the rationale of evaluating the IR algorithms, the status of the tradi- tional evaluation, and the applicability of the proposed novel evaluation methods and meas- ures. It further points out research problems requiring attention for further advances in the area. The Laboratory Model is found limited but still useful for the specific tasks it fulfills in the development of IR algorithms.
in press). Evaluating exploratory search systems
- SIGIR Workshop paper (Seattle
"... Online search has become an increasingly important part of the everyday lives of most computer users. Generally, popular search tools support users well, however, in situations where the search problem is poorly defined, or the information seeker is unfamiliar with the problem domain, or the search ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Online search has become an increasingly important part of the everyday lives of most computer users. Generally, popular search tools support users well, however, in situations where the search problem is poorly defined, or the information seeker is unfamiliar with the problem domain, or the search task requires some exploration or the consideration of multiple perspectives, such tools may not operate as effectively. To address situations where technology may not meet their needs, users have developed coping strategies involving the submission of multiple queries and the interactive exploration of the retrieved document space, selectively following links and passively obtaining cues about where their next steps lie. This is an example of exploratory search behavior, and comprises a mixture of serendipity, learning, and investigation [7]. Exploratory search can be seen as a specialization of information
User-Oriented Evaluation Methods for Information Retrieval: A Case Study Based on Conceptual Models for Query Expansion
"... This paper discusses evaluation methods based on the use of non-dichotomous relevance judgements in information retrieval (IR) experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is deskable from the user's point ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper discusses evaluation methods based on the use of non-dichotomous relevance judgements in information retrieval (IR) experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is deskable from the user's point of view in modem large IR environments. The proposed methods are (1) a novel application of P-R curves and average precision computations based on separate recall bases for documents of different degrees of relevance, and (2) two novel measures computing the cumulated gain the user obtains by examining the retrieval result up to a given ranked position. We then demonstrate the use of these evaluation methods in a case study on the effectiveness of query types, based on combinations of query structures and expansion, in retrieving documents of various degrees of relevance. Query expansion is based on concepts, which are selected from a conceptual model, and then expanded by semantic relationships given in the model. The test is run with a best match retrieval system (inQuery) in a text database consisting of newspaper articles.
Diagnostic evaluation of a personalized filtering information retrieval system. Methodology and experimental results
- In Proceedings of RIAO 2000 "Content
, 2003
"... The study presented in this paper deals with the diagnostic evaluation of a system being implemented. The tested system's particularity is to provide a filtering process taken into user's account personal characteristics. The aim of diagnostic evaluation is to choose one filtering process between 8 ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The study presented in this paper deals with the diagnostic evaluation of a system being implemented. The tested system's particularity is to provide a filtering process taken into user's account personal characteristics. The aim of diagnostic evaluation is to choose one filtering process between 8 proposed ones. 16300 interrogations are used as a representative sample. It combines characteristics relating to: the user's profile, the user's need of information and the filtering process. Answers are compared relating to: the number of common documents, the rank of common documents and the specificity degree of the query. These criteria give indication about the filtering impact.

