Results 1 -
8 of
8
Optimizing Search Engines using Clickthrough Data
, 2002
"... This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches ..."
Abstract
-
Cited by 568 (20 self)
- Add to MetaCart
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.
Evaluating retrieval performance using clickthrough data
, 2003
"... This paper proposes a new method for evaluating the quality of retrieval functions. Unlike traditional methods that require relevance judgments by experts or explicit user feedback, it is based entirely on clickthrough data. This is a key advantage, since clickthrough data can be collected at very l ..."
Abstract
-
Cited by 44 (6 self)
- Add to MetaCart
This paper proposes a new method for evaluating the quality of retrieval functions. Unlike traditional methods that require relevance judgments by experts or explicit user feedback, it is based entirely on clickthrough data. This is a key advantage, since clickthrough data can be collected at very low cost and without overhead for the user. Taking an approach from experiment design, the paper proposes an experiment setup that generates unbiased feedback about the relative quality of two search results without explicit user feedback. A theoretical analysis shows that the method gives the same results as evaluation with traditional relevance judgments under mild assumptions. An empirical analysis verifies that the assumptions are indeed justified and that the new method leads to conclusive results in a WWW retrieval study. 1
The Philosophy of Information Retrieval Evaluation
- In Proceedings of the The Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
"... Evaluation conferences such as TREC, CLEF, and NTCIR are modern examples of the Cran eld evaluation paradigm. In the Cran- eld paradigm, researchers perform experiments on test collections to compare the relative eectiveness of dierent retrieval approaches. The test collections allow the resear ..."
Abstract
-
Cited by 43 (1 self)
- Add to MetaCart
Evaluation conferences such as TREC, CLEF, and NTCIR are modern examples of the Cran eld evaluation paradigm. In the Cran- eld paradigm, researchers perform experiments on test collections to compare the relative eectiveness of dierent retrieval approaches. The test collections allow the researchers to control the eects of dierent system parameters, increasing the power and decreasing the cost of retrieval experiments as compared to user-based evaluations. This paper reviews the fundamental assumptions and appropriate uses of the Cran- eld paradigm, especially as they apply in the context of the evaluation conferences.
Interactive evaluation of the Ostensive Model using a new test collection of images with multiple relevance assessments
- Journal of Information Retrieval
, 2000
"... The Ostensive Model proposes a manner of structuring the uncertainty associated with individual relevance judgements as sources of evidence in relevance feedback. It proposes temporal profiles of uncertainty, motivating the application of a particular class of discount function with respect to the a ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
The Ostensive Model proposes a manner of structuring the uncertainty associated with individual relevance judgements as sources of evidence in relevance feedback. It proposes temporal profiles of uncertainty, motivating the application of a particular class of discount function with respect to the age of the evidence. This paper presents an initial evaluation of the relative effectiveness of different uncertainty discount functions. A novel direct manipulation interface to a multimedia retrieval system embodying the Ostensive Model is outlined briefly. The paper describes the construction and characteristics of a new image test collection utilising multiple binary relevance assessments. The use of such multiple assessments and multiple interpretations of them are discussed. The evaluation environment is detailed in terms of the interface, test collection, and tasks set to users. Multiple interpretations of the results, and the statistical significance of comparisons are presented. Th...
Implementing a Question Answering Evaluation
- In Proceedings of LREC’2000 Workshop on Using Evaluation within HLT Programs: Results and Trends
, 2000
"... The most recent TREC workshop contained a track for evaluating domain-independent question answering (QA) systems. In addition to fostering research on the QA task, the track was used to investigate whether the evaluation methodology used for document retrieval is appropriate for a different natural ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The most recent TREC workshop contained a track for evaluating domain-independent question answering (QA) systems. In addition to fostering research on the QA task, the track was used to investigate whether the evaluation methodology used for document retrieval is appropriate for a different natural language processing task. This paper describes the implementation of the track, focusing on how the test questions were selected and the responses judged. By using multiple assessors to independently judge the responses, we verified that assessors do have legitimate differences of opinion as to the correctness of a response, even for the fact-based, short-answer questions used in the track. The Text REtrieval Conference (TREC) is a workshop series organized by the U.S. National Institute of Standards and Technology (NIST) designed to advance the state-ofthe -art in text retrieval (Voorhees, 2000). The most recent TREC, held in November 1999, included a "track" on question answering where t...
Relevance judgments for assessing recall
- Information Processing and Management
, 1996
"... Abstract--Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document, Although recall and precision are easily and unambiguously defined, selecting the documents relevant ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract--Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document, Although recall and precision are easily and unambiguously defined, selecting the documents relevant to a query has long been recognized as problematic. To compare performance of different systems, standard collections of documents, queries, and relevance judgments have been used. Unfortunately the standard collections, such as SMART and TREC, have locked in a particular approach to relevance that is suitable for assessing precision but not recall. The problem is demonstrated by comparing two information retrieval methods over several queries, and showing how a new method of forming relevance judgments that is suitable for assessing recall gives different results. Recall is an interesting and practical issue, but current test procedures are inadequate for measuring it. Copyright ©
News and Trading Rules
, 2003
"... AI has long been applied to the problem of predicting financial markets. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
AI has long been applied to the problem of predicting financial markets.
Relevance Judgements for Assessing Recall
, 1995
"... Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document. Although recall and precision are easily and unambiguously defined, selecting the documents relevant to a qu ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document. Although recall and precision are easily and unambiguously defined, selecting the documents relevant to a query has long been recognised as problematic. To compare performance of different systems, standard collections of documents, queries, and relevance judgements have been used. Unfortunately the standard collections, such as SMART and TREC, have locked in a particular approach to relevance and this has affected subsequent research. Two styles of information need are distinguished, high precision and high recall, and a method of forming relevance judgements suitable for each is described. The issues are illustrated by comparing two retrieval systems, keyword retrieval and semantic signatures, on different sets of relevance judgements. 2 Introduction Four decades of testing informatio...

