Results 1 - 10
of
84
Okapi at TREC-3
, 1996
"... this document length correction factor is #global": it is added at the end, after the weights for the individual terms have been summed, and is independentofwhich terms match. ..."
Abstract
-
Cited by 370 (5 self)
- Add to MetaCart
this document length correction factor is #global": it is added at the end, after the weights for the individual terms have been summed, and is independentofwhich terms match.
A Probabilistic Model of Information Retrieval: Development and Status
, 1998
"... The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Eac ..."
Abstract
-
Cited by 206 (16 self)
- Add to MetaCart
The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is effective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations.
Simple, Proven Approaches to Text Retrieval
, 1997
"... This technical note describes straightforward techniques for document indexing and retrieval that have been solidly established through extensive testing and are easy to apply. They are useful for many different types of text material, are viable for very large files, and have the advantage that the ..."
Abstract
-
Cited by 86 (3 self)
- Add to MetaCart
This technical note describes straightforward techniques for document indexing and retrieval that have been solidly established through extensive testing and are easy to apply. They are useful for many different types of text material, are viable for very large files, and have the advantage that they do not require special skills or training for searching, but are easy for end users. The document and text retrieval methods described here have a sound theoretical basis, are well established by extensive testing, and the ideas involved are now implemented in some commercial retrieval systems. Testing in the last few years has, in particular, shown that the methods presented here work very well with full texts, not only title and abstracts, and with large files of texts containing three quarters of a million documents. These tests, the TREC Tests (see Harman 1993 - 1997; IP&M 1995), have been rigorous comparative evaluations involving many different approaches to information retrieval. ...
Okapi at TREC-7: Automatic ad hoc, filtering, VLC and interactive track
- In
, 1999
"... e passes. Two pairs of runs were submitted: in one pair queries remained constant, but in the other query terms were reweighted when fresh relevance information became available. VLC track Four runs on the full database were submitted, together with one each on the 10# and 1# collections. Unexpect ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
e passes. Two pairs of runs were submitted: in one pair queries remained constant, but in the other query terms were reweighted when fresh relevance information became available. VLC track Four runs on the full database were submitted, together with one each on the 10# and 1# collections. Unexpectedly, unexpanded queries did better than expanded ones; the best run used all topic #elds and some adjacent term pairs from the topics. The best expanded run used one of the TREC#7 ad hoc query sets #expanded on disks 1#5#. Interactive track Two pairwise comparisons were made: Okapi with relevance feedback against Okapi without, and Okapi without against ZPrise without. Okapi without performed somewhat worse than ZPrise, and Okapi with only partially recovered the de#cit. # Microsoft Research Ltd, 1 Guildhall Street, Cambridge CB2 3NH, UK, and City University, London, UK. email ser@microsoft.com y Microsoft Research Ltd, 1 Guildha
Learning Search Engine Specific Query Transformations for Question Answering
- In Proceedings of WWW10
, 2001
"... We introduce a method for learning query transformations that improves the ability to retrieve answers to questions from an information retrieval system. During the training stage the method involves automatically learning phrase features for classifying questions into different types, automatically ..."
Abstract
-
Cited by 64 (5 self)
- Add to MetaCart
We introduce a method for learning query transformations that improves the ability to retrieve answers to questions from an information retrieval system. During the training stage the method involves automatically learning phrase features for classifying questions into different types, automatically generating candidate query transformations from a training set of question/answer pairs, and automatically evaluating the candidate transforms on target information retrieval systems such as real-world general purpose search engines. At run time, questions are transformed into a set of queries, and re-ranking is performed on the documents retrieved. We present a prototype search engine, Tritus, that applies the method to web search engines. Blind evaluation on a set of real queries from a web search engine log shows that the method significantly outperforms the underlying web search engines as well as a commercial search engine specializing in question answering.
Understanding inverse document frequency: On theoretical arguments for IDF
- Journal of Documentation
, 2004
"... The term weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon’s Information Theory) seeking to establish some theoretical ba ..."
Abstract
-
Cited by 55 (1 self)
- Add to MetaCart
The term weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon’s Information Theory) seeking to establish some theoretical basis for it. Some of these attempts are reviewed, and it is shown that the Information Theory approaches are problematic, but that there are good theoretical justifications of both IDF and TF*IDF in traditional probabilistic model of information retrieval.
Re-examining the potential effectiveness of interactive query expansion, in: SIGIR ’03
- Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, ACM
, 2003
"... Much attention has been paid to the relative effectiveness of interactive query expansion versus automatic query expansion. Although interactive query expansion has the potential to be an effective means of improving a search, in this paper we show that, on average, human searchers are less likely t ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
Much attention has been paid to the relative effectiveness of interactive query expansion versus automatic query expansion. Although interactive query expansion has the potential to be an effective means of improving a search, in this paper we show that, on average, human searchers are less likely than systems to make good expansion decisions. To enable good expansion decisions, searchers must have adequate instructions on how to use interactive query expansion functionalities. We show that simple instructions on using interactive query expansion do not necessarily help searchers make good expansion decisions and discuss difficulties found in making query expansion decisions. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]:- search process, relevance feedback.
Finding Relevant Documents using Top Ranking Sentences An Evaluation of Two Alternative Schemes
- In Proceedings of SIGIR 2002
, 2002
"... In this paper we present an evaluation of techniques that are designed to encourage web searchers to interact more with the results of a web search. Two specific techniques are examined: the presentation of sentences that highly match the searcher's query and the use of implicit evidence. Implicit e ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
In this paper we present an evaluation of techniques that are designed to encourage web searchers to interact more with the results of a web search. Two specific techniques are examined: the presentation of sentences that highly match the searcher's query and the use of implicit evidence. Implicit evidence is evidence captured from the searcher's interaction with the retrieval results and is used to automatically update the display. Our evaluation concentrates on the effectiveness and subject perception of these techniques. The results show, with statistical significance, that the techniques are effective and efficient for information seeking.
Experimentation as a way of life: Okapi at TREC
- Information Processing & Management
, 2000
"... Information Processing and Management 36 (2000) 95±108 www.elsevier.com/locate/infoproman The Okapi system has been used in a series of experiments on the TREC collections, investigating probabilistic models, relevance feedback and query expansion, and interaction issues. The TREC-6 ad hoc task was ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
Information Processing and Management 36 (2000) 95±108 www.elsevier.com/locate/infoproman The Okapi system has been used in a series of experiments on the TREC collections, investigating probabilistic models, relevance feedback and query expansion, and interaction issues. The TREC-6 ad hoc task was used to test an application of a new relevance weighting formula, which takes account of documents judged nonrelevant. The application was to a form of blind feedback (using the top-ranked documents from an initial search to improve the query formulation for a subsequent search, without actual relevance feedback, on the assumption that these top-ranked documents are likely to be relevant). In the routing task, the problem is one of query optimization based on a training set with known relevant documents; investigations for TREC-6 included using a form of simulated annealing for this purpose. A signi®cant feature of this work is the need to avoid over®tting of the training sample. In the interactive track, methodology remains the major problem: we do not yet know how to conduct controlled laboratory experiments which provide good information about information retrieval interaction. The Okapi team has been particularly interested in the relation between the functionalities associated with relevance feedback and the ability of searchers to make use of these functionalities. TREC provides an excellent environment and set of tools for investigating automatic systems; its value

