MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Two-stage language models for information retrieval (2003) [109 citations — 13 self]

Download:
Download as a PDF | Download as a PS
by Chengxiang Zhai, John Lafferty
http://www-2.cs.cmu.edu/People/lafferty/ps/two-stage.ps
Add To MetaCart

Abstract:

The optimal settings of retrieval parameters often depend on both the document collection and the query, and are usually found through empirical tuning. In this paper, we propose a family of two-stage language models for information retrieval that explicitly captures the different influences of the query and document collection on the optimal settings of retrieval parameters. As a special case, we present a two-stage smoothing method that allows us to estimate the smoothing parameters completely automatically. In the first stage, the document language model is smoothed using a Dirichlet prior with the collection language model as the reference model. In the second stage, the smoothed document language model is further interpolated with a query background language model. We propose a leave-one-out method for estimating the Dirichlet parameter of the first stage, and the use of document mixture models for estimating the interpolation parameter of the second stage. Evaluation on five different databases and four types of queries indicates that the twostage smoothing method with the proposed parameter estimation methods consistently gives retrieval performance that is close to---or better than---the best results achieved using a single smoothing method and exhaustive parameter search on the test data.

Citations

881 Term-weighting approaches in automatic text retrieval – Salton, Buckley - 1998
418 A language modeling approach to information retrieval – Ponte, Croft - 1998
406 Sparck-Jones K., Relevance weighting of search terms – Robertson - 1976
329 A vector space model for automatic indexing – Salton, Wong, et al. - 1975
251 Okapi at TREC-3 – Robertson, Walker, et al. - 1994
245 Pivoted document length normalization – Singhal, Buckley, et al. - 1996
231 A study of smoothing methods for language models applied to ad hoc information retrieval – Zhai, Lafferty - 2001
151 Information Retrieval as statistical translation – Berger, Lafferty - 1999
144 Document language models, query models, and risk minimization for information retrieval – Lafferty, Zhai - 2001
124 Relevance-based language models – Lavrenko, Croft - 2001
123 A hidden markov model information retrieval – Miller, Leek, et al. - 1999
86 Twenty-one at TREC-7: Ad-hoc and cross-language track – Hiemstra, Kraaij - 1998
46 Model-based feedback in the KL-divergence retrieval model – Zhai, Lafferty - 2001
44 On the Estimation of ’Small’ Probabilities by Leaving-One-Out – ESSEN
32 Improving two-stage ad-hoc retrieval for short queries – Kwok, Chan - 1998
7 editors (2001 – Voorhees, Harman
5 Probabilistic IR models based on query and document generation – Lafferty, Zhai - 2001