#### DMCA

## Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval (1994)

### Cached

### Download Links

- [www.soi.city.ac.uk]
- [staff.city.ac.uk]
- [www.computing.dcu.ie]
- [nclt.computing.dcu.ie]
- [www.computing.dcu.ie]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of SIGIR’94 |

Citations: | 461 - 14 self |

### Citations

755 |
Relevance weighting of search terms.
- Robertson, Sparck-Jones
- 1976
(Show Context)
Citation Context ...ce” [7]), lead to the decomposition of w into additive components such as individual term weights. In the presence/absence case, the resulting weighting function is the Robertson/Sparck Jones formula =-=[8]-=- for a term-presence-only weight, as follows: p(1 − q) w = log , (2) q(1 − p) where p = P (term present|R) and q = P (term present|R). With a suitable estimation method, this becomes: (r + 0.5)/(R − r... |

291 |
An Algorithm for Suffix Stripping,” Program,
- Porter
- 1980
(Show Context)
Citation Context ...tions, described as disks 1 & 2 (TREC raw data has been distributed on three CD-ROMs). It contains about 743,000 documents. It was indexed by keyword stems, using a modified Porter stemming procedure =-=[13]-=-, spelling normalisation designed to conflate British and American spellings, a moderate stoplist of about 250 words and a small cross-reference table and “go” list. Topics 101–150 of the 150 TREC–1 a... |

143 |
Overview of the Second Text REtrieval Conference (TREC-2
- Harman
- 1994
(Show Context)
Citation Context ...n of NIST (National Institute for Standards and Technology). There were about 31 participants, academic and commercial, in the TREC-2 conference which took place at Gaithersburg, MD in September 1993 =-=[2]-=-. Information needs are presented in the form of highly structured “topics” from which queries are to be derived automatically and/or manually by participants. Documents include newspaper articles, en... |

118 |
A probabilistic approach to automatic keyword indexing: Part i. on the distribution of specialty words in a technical literature.
- Harter
- 1975
(Show Context)
Citation Context ...roximation to inverse collection frequency weighting demonstrated by Croft and Harper [4]). The formal model which is used to investigate the effects of these variables is the 2–Poisson model (Harter =-=[5]-=-, Robertson, van Rijsbergen and Porter [6]).s2 Basic Probabilistic Weighting Model The basic weighting function used is that developed in [6], and may be expressed as follows: w(x) = log ¯ P (x ¯ |R) ... |

89 |
Automatic retrieval with locality information using smart. In: TREC.
- Buckley, Salton, et al.
- 1992
(Show Context)
Citation Context ...to combine with the ideas discussed above, in a way which would accommodate an explanation of document length in terms of a mixture of the two hypotheses. One possible solution is that used by Salton =-=[11]-=-, of allowing passages to compete with full documents for retrieval. But there seems to be room for more theoretical analysis. 5.8 Document Length and Term Frequency—Summary We have, then, a term weig... |

68 |
Probabilistic models of indexing and searching.
- Robertson, Rijsbergen, et al.
- 1981
(Show Context)
Citation Context ... weighting demonstrated by Croft and Harper [4]). The formal model which is used to investigate the effects of these variables is the 2–Poisson model (Harter [5], Robertson, van Rijsbergen and Porter =-=[6]-=-).s2 Basic Probabilistic Weighting Model The basic weighting function used is that developed in [6], and may be expressed as follows: w(x) = log ¯ P (x ¯ |R) P (0 ¯ |R) , (1) P (x|R) P (0|R) where x i... |

49 | Efficient retrieval of partial documents
- Zobel, Moffat, et al.
- 1995
(Show Context)
Citation Context ...opriate boundaries in the documents, and to treat short passages rather than full documents as the retrievable units. There have been a number of experiments on these lines reported in the literature =-=[10]-=-. This approach appears difficult to combine with the ideas discussed above, in a way which would accommodate an explanation of document length in terms of a mixture of the two hypotheses. One possibl... |

40 |
Some inconsistencies and misnomers in probabilistic information retrieval.
- Cooper
- 1991
(Show Context)
Citation Context ...ent frequency; 0 would ¯ then be the “natural” zero vector representing all query terms absent. ¯ In this formulation, independence assumptions (or, indeed, Cooper’s assumption of “linked dependence” =-=[7]-=-), lead to the decomposition of w into additive components such as individual term weights. In the presence/absence case, the resulting weighting function is the Robertson/Sparck Jones formula [8] for... |

23 |
Using probabilistic models of information retrieval without relevance information
- Croft, Harper
- 1979
(Show Context)
Citation Context ...tion frequency of terms appears naturally in traditional probabilistic models, particularly in the form of the approximation to inverse collection frequency weighting demonstrated by Croft and Harper =-=[4]-=-). The formal model which is used to investigate the effects of these variables is the 2–Poisson model (Harter [5], Robertson, van Rijsbergen and Porter [6]).s2 Basic Probabilistic Weighting Model The... |

19 |
Modelling documents with multiple poisson distributions
- Margulis
- 1993
(Show Context)
Citation Context ...s complex in the sense of requiring a large number of different parameters to be estimated. Subsequent work on mixed-Poisson models has suggested that alternative estimation methods may be preferable =-=[9]-=-. Combining the 2–Poisson model with formula 4, under the various assumptions given about dependencies, we obtain [6] the following weight for a term t: w = log (p′ λtf e−λ + (1 − p ′ )µ tf e−µ ) (q ′... |

13 |
Probabilistic retrieval in the TIPSTER collections: an application of staged logistic regression
- Cooper, Gey, et al.
- 1992
(Show Context)
Citation Context ...re formulae are tried because they seem to be plausible. Both categories have had some notable successes. A more recent variant is the regression approach of Fuhr and Cooper (see, for example, Cooper =-=[3]-=-), which incorporates ad-hoc choice of independent variables and functions of them with a formal model for assessing their value in retrieval, selecting from among them and assigning weights to them. ... |

5 |
et al. Okapi at TREC{2
- Robertson
(Show Context)
Citation Context ...s into traditional probabilistic models for information retrieval, and some experimental results relating thereto. Some of the discussion has appeared in the proceedings of the second TREC conference =-=[1]-=-, albeit in less detail. Statistical approaches to information retrieval have traditionally (to over-simplify grossly) taken two forms: firstly approaches based on formal models, where the model speci... |

5 |
Query-document symmetry and dual models
- Robertson
- 1994
(Show Context)
Citation Context ...ly constructed combined model would have fairly complex relations between query and document terms, query and document eliteness, and relevance). Both these matters are discussed further by Robertson =-=[12]-=-. In the meantime, the combination of either qtf multiplier with the earlier functions must be regarded as not having a strong theoretical motivation.s7 Experiments 7.1 TREC The TREC (Text REtrieval C... |

1 | The Second Text REtrieval Conference (TREC-2). NIST Gaithersburg MD, to appear, Cooper W.S. et al. Probabilistic retrieval in the TIPSTER collection: an application of staged logistic regression - Harman - 1993 |