MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Risk Minimization and Language Modeling in Text Retrieval (2002) [18 citations — 4 self]

Download:
Download as a PDF
by Chengxiang Zhai
http://sifaka.cs.uiuc.edu/czhai/pub/thesis-summary.pdf
Add To MetaCart

Abstract:

This thesis presents a new general probabilistic framework for text retrieval based on Bayesian decision theory. In this framework, queries and documents are modeled using statistical language models, user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. This risk minimization framework not only unifies several existing retrieval models within one general probabilistic framework, but also facilitates the development of new principled approaches to text retrieval through the use of statistical language models. We explore three interesting special cases of the framework. In the case of a two-stage language modeling approach, we show that it is possible to achieve excellent retrieval performance without any ad hoc parameter tuning by exploiting statistical estimation methods to set the retrieval parameters completely automatically. In another case of a KL-divergence retrieval model, we demonstrate that it is possible to improve retrieval performance by using improved language models estimated based on feedback documents. Finally, in the case of non-traditional aspect retrieval models, we show that it is possible to use language models to capture redundancy and sub-topics in documents, and to perform “contextsensitive” ranking of documents based on both relevance and novelty of documents. 1

Citations

4364 Elements of Information Theory – Cover, Thomas - 1991
4344 Maximum likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
2217 J.: Introduction to Modern Information Retrieval – Salton, Macgill - 1983
1463 Indexing by Latent Semantic Analysis – Deerwester, Dumais, et al. - 1990
957 Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer – Salton
881 Term-Weighting Approaches in Automatic Text Retrieval – Salton, Buckley - 1988
628 Statistical decision theory and Bayesian analysis – Berger - 1985
559 Relevance feedback in information retrieval, The – Rocchio - 1971
518 Estimation of probabilities from sparse data for the language model component of speech recogniser – Katz - 1987
462 Improving retrieval performance by relevance feedback – Salton, Buckley - 1990
452 Statistical Methods for Speech Recognition – Jelinek - 1998
418 A language modeling approach to information retrieval – Ponte, Croft - 1998
406 Relevance weighting of search terms – Robertson, Jones - 1988
373 Latent Dirichlet allocation – Blei, Ng, et al. - 2003
335 An empirical study of smoothing techniques for language modeling – Chen, Goodman - 1996
329 A vector space model for automatic indexing – Salton, Wong, et al. - 1975
296 The INQUERY retrieval system – Callan, Croft, et al. - 1992
283 Query expansion using local and global document analysis – Xu, Croft - 1996
259 1999] Probabilistic latent semantic indexing – Hofmann
251 Okapi at TREC-3 – Robertson, Walker, et al. - 1994
245 Pivoted document length normalization – Singhal, Buckley, et al. - 1996
238 The population frequencies of species and the estimation of population parameters – Good - 1953
231 A study of smoothing methods for language models applied to ad hoc information retrieval – Zhai, Lafferty - 2001
225 Naive (bayes) at forty: The independence assumption in information retrieval – Lewis - 1998
196 Some simple effective approximations to the 2–poisson model for probabilistic weighted retrieval – Robertson, Walker - 1994
178 Divergence Measures Based on the Shannon Entropy – Lin - 1991
174 The use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries – Carbonell, Goldstein - 1998
173 Evaluation of an inference network-based retrieval model – Turtle, Croft - 1991
151 Information Retrieval as statistical translation – Berger, Lafferty - 1999
150 The probability ranking principle in ir – Robertson, Sparck-Jones - 1977
148 A non-classical logic for information retrieval – Rijsbergen - 1986
144 Document language models, query models, and risk minimization for information retrieval – Lafferty, Zhai - 2001
141 Representation and learning in information retrieval – Lewis - 1992
140 Extended boolean information retrieval – Salton, Fox, et al. - 1983
133 On structuring probabilistic dependences in stochastic language modeling. Computer Speech and Language 8:1–38 – Ney, Essen, et al. - 1994
132 A probabilistic model of information retrieval: development and comparative experiments – Jones, Walker, et al. - 2000
124 Relevance-based language models – Lavrenko, Croft - 2001
124 On relevance, probabilistic indexing and information retrieval – Maron, Kuhns - 1960
123 A hidden markov model information retrieval – Miller, Leek, et al. - 1999
122 Using Language Models for Information Retrieval – Hiemstra - 2001
120 Using probabilistic models of document retrieval without relevance information – Croft, Harper - 1979
109 Language Modeling for Information Retrieval – Croft, Lafferty - 2003
103 Okapi/Keenbow at TREC-8 – Robertson, Walker - 1999
100 Automatic query expansion using SMART: TREC 3. Paper presented at the NIST Special Publication 500–225: The Third Text REtrieval Conference (TREC-3 – Buckley, Salton, et al. - 1995
93 A general language model for information retrieval – Song, Croft - 1999
86 Twenty-one at TREC-7: Ad-hoc and cross-language track – Hiemstra, Kraaij - 1998
83 On modeling information retrieval with probabilsitic inference – Wong, Yao - 1995
82 Two decades of statistical language modeling: Where do we go from here – Rosenfeld
80 A probabilistic learning approach for document indexing – Fuhr, Buckley - 1991
75 Improved backing-off for m-gram language modeling – Kneser, Ney - 1995