MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Information Retrieval as Statistical Translation (1999) [165 citations — 7 self]

Download:
Download as a PDF | Download as a PS
by Adam Berger, John Lafferty
In Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval
http://www.cs.cmu.edu/~lafferty/ps/irast-final.ps.Z
Add To MetaCart

Abstract:

We propose a new probabilistic approach to information retrieval based upon the ideas and methods of statistical machine translation. The central ingredient in this approach is a statistical model of how a user might distill or "translate " a given document into a query. To assess the relevance of a document to a user's query, we estimate the probability that the query would have been generated as a translation of the document, and factor in the user's general preferences in the form of a prior distribution over documents. We propose a simple, well motivated model of the document-to-query translation process, and describe an algorithm for learning the parameters of this model in an unsupervised manner from a collection of documents. As we show, one can view this approach as a generalization and justification of the "language modeling " strategy recently proposed by Ponte and Croft. In a series of experiments on TREC data, a simple translation-based retrieval system performs well in comparison to conventional retrieval techniques. This prototype system only begins to tap the full potential of translation-based retrieval. 1

Citations

4704 Maximum likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
927 Term-weighting approaches in automatic text retrieval – Salton, Buckley - 1988
516 The mathematics of statistical machine translation: parameter estimation – Brown, Pietra, et al. - 1993
451 A language modeling approach to information retrieval – Ponte, Croft - 1998
430 A statistical approach to machine translation – unknown authors - 1990
430 Relevance weighting of search terms – Robertson, Jones - 1976
127 Identifying word correspondences in parallel texts – Gale, Church - 1991
125 Using probabilistic models of document retrieval without relevance information – Croft, Harper - 1979
58 Probabilistic models for automatic indexing – Bookstein, Swanson - 1974
46 Okapi at TREC – Robertson, Walker, et al. - 1992
46 Efficient probabilistic inference for text retrieval – Turtle, Croft - 1991
16 But dictionaries are data too – Brown, Pietra, et al. - 1993
6 Information retrieval on the web: Tools and algorithmic issues – Broder, Henzinger - 1998
3 Translation (1949 – Weaver - 1955