MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Learning and Discovery

Download:
Download as a PDF | Download as a PS
by Chengxiang Zhai, William W. Cohen, John Lafferty
http://www-2.cs.cmu.edu/~czhai/paper/sigir2003-bir.ps
Add To MetaCart

Abstract:

We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.

Citations

351 A threshold of ln n for approximating set cover – Feige - 1998
231 A study of smoothing methods for language models applied to ad hoc information retrieval – Zhai, Lafferty - 2001
174 The use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries – Carbonell, Goldstein - 1998
150 The probability ranking principle in ir – Robertson, Sparck-Jones - 1977
144 Document language models, query models, and risk minimization for information retrieval – Lafferty, Zhai - 2001
82 IR evaluation methods for retrieving highly relevant documents – Jarvelin, Kekalainen - 2000
57 Experiments using the Lemur toolkit – Ogilvie, Callan - 2002
46 Model-based feedback in the KL-divergence retrieval model – Zhai, Lafferty - 2001
45 Novelty and Redundancy Detection in Adaptive Filtering – Zhang, Callan, et al. - 2002
38 Temporal Summaries of News Topics – Allan, Gupta, et al.
32 TREC-10 Interactive Track Report – Hersh, Over - 2001
28 Overview of the TREC 2002 novelty track – Harman - 2002
16 Relevance reconsidered – Saracevic - 1996
3 Economics and search (Invited talk at SIGIR – Varian - 1999