MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Abstract Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p 2

Download:
pdf
by Kenneth W. Church
http://acl.ldc.upenn.edu/C/C00/C00-1027.pdf
Add To MetaCart

Abstract:

Repetition is very common. Adaptive language models, which allow probabilities to change or adapt after seeing just a few words of a text, were introduced in speech recognition to account for text cohesion. Suppose a document mentions Noriega once. What is the chance that he will be mentioned again? if the first instance has probability p, then under standard (bag-of words) independence assumptions, two in-stances ought to have probability p2, but we find the probability is actually closer to p/2. The first men-tion of a word obviously depends on frequency, but surprisingly, the second does not. Adaptation de-pends more on lexical content than fl'equency; there is more adaptation for content words (proper nouns, technical terminology and good keywords for information retrieval), and less adaptation for function words, cliches and ordinary first names. 1.

Citations

135 Frequency Analysis of English Usage – Francis, Kuera - 1982
51 Poisson mixtures – Church, Gale - 1995
38 Context and structure in automated full-text information access – Hearst - 1994
19 Dynamic Nonlocal Language Modeling via Hierarchical Topic-Based Adaptation – Florian, Yarowsky - 1999