• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Accurate Methods for the Statistics of Surprise and Coincidence (1993)

Cached

  • Download as a PDF

Download Links

  • [acl.ldc.upenn.edu]
  • [www.aclweb.org]
  • [www.coli.uni-saarland.de]
  • [www.coli.uni-saarland.de]
  • [wing.comp.nus.edu.sg]
  • [www1.cs.columbia.edu]
  • [www.aclweb.org]
  • [aclweb.org]
  • [aclweb.org]
  • [ucrel.lancs.ac.uk]
  • [nlp.cs.swarthmore.edu]
  • [wing.comp.nus.edu.sg]
  • [www1.cs.columbia.edu]
  • [tina.lancs.ac.uk]
  • [www.comp.lancs.ac.uk]
  • [ucrel.lancs.ac.uk]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Ted Dunning
Venue:COMPUTATIONAL LINGUISTICS
Citations:1054 - 1 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Dunning93accuratemethods,
    author = {Ted Dunning},
    title = {Accurate Methods for the Statistics of Surprise and Coincidence},
    journal = {COMPUTATIONAL LINGUISTICS},
    year = {1993},
    volume = {19},
    number = {1},
    pages = {61--74}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Much work has been done on the statistical analysis of text. In some cases reported in the literature, inappropriate statistical methods have been used, and statistical significance of results have not been addressed. In particular, asymptotic normality assumptions have often been used unjustifiably, leading to flawed results.This assumption of normal distribution limits the ability to analyze rare events. Unfortunately rare events do make up a large fraction of real text.However, more applicable methods based on likelihood ratio tests are available that yield good results with relatively small samples. These tests can be implemented efficiently, and have been used for the detection of composite terms and for the determination of domain-specific terms. In some cases, these measures perform much better than the methods previously used. In cases where traditional contingency table methods work well, the likelihood ratio tests described here are nearly identical.This paper describes the basis of a measure based on likelihood ratios that can be applied to the analysis of text.

Keyphrases

accurate method    rare event    likelihood ratio test    small sample    traditional contingency table method    inappropriate statistical method    real text    domain-specific term    large fraction    asymptotic normality assumption    composite term    applicable method    statistical analysis    normal distribution    much work    statistical significance    yield good result    likelihood ratio   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University