(Enter summary)
Abstract: This paper describes a method for correcting English text using a PPM model.
A method that segments words in English text is introduced and is shown to be a
significant improvement over previously used methods. A similar technique is also
applied as a post-processing stage after pages have been recognized by a state-of-theart
commercial OCR system. We show that the accuracy of the OCR system can be
increased from 96.3% to 96.9%, a decrease of about 14 errors per page. (Update)
Context of citations to this paper: More
...the escape probability) sum to 1. One slight further improvement to PPM is incorporated in the experiments: deterministic scaling (Teahan, 1997). Although it probably has negligible effect on our overall results, we record it here for completeness. Experiments show that in...
...they do not feature as prominently in the literature. They have been most vigorously applied to the domain of text compression (Teahan, 1998; Ristad Thomas, 1995; Bell, Cleary Witten, 1990) but also to other application domains, such as cryptography (Irvine, 1997)...
Cited by: More
Implementing the Context Tree Weighting Method for.. - Dawy, Hagenauer.. (2004)
(Correct)
A Compression-based Algorithm for Chinese Word Segmentation - Teahan, Wen, McNab, Witten (2000)
(Correct)
Combining PPM models using a text mining approach - Teahan, Harper (2001)
(Correct)
Similar documents (at the sentence level):
22.7%: Text Classification and Segmentation Using Minimum Cross-Entropy - Teahan (2000)
(Correct)
7.3%: Applying Compression to Natural Language Processing - Teahan, Cleary (1997)
(Correct)
Active bibliography (related documents): More All
0.4: Models of English text - Teahan, Cleary (1997)
(Correct)
0.3: An Open Interface for Probabilistic Models of Text - Cleary, Teahan (1999)
(Correct)
0.3: Language Modelling - Candidate Supervisor Dr
(Correct)
Similar documents based on text: More All
0.5: The Entropy Of English Using Ppm-Based Models - Teahan, Cleary (1996)
(Correct)
0.5: Unbounded Length Contexts for PPM - Cleary, Teahan (1997)
(Correct)
0.4: Unbounded length contexts for PPM - Cleary, Teahan (1993)
(Correct)
Related documents from co-citation: More All
6: Data compression using adaptive coding and partial string matching
- Cleary, Witten - 1984
5: Modelling English text
- - 1997
4: Unbounded length contexts for PPM
- Cleary, Teahan et al. - 1995
BibTeX entry: (Update)
Teahan, W.J., Inglis, S., Cleary, J.G. & Holmes, G. 1998. \Correcting English text using PPM models" in Proceedings DCC'98, edited by Storer, J.A. & Cohn, M., IEEE Computer Society Press. http://citeseer.ist.psu.edu/teahan98correcting.html More
@inproceedings{ teahan98correcting,
author = "W. J. Teahan and Stuart Inglis and John G. Cleary and Geoffrey Holmes",
title = "Correcting English Text Using {PPM} Models",
booktitle = "Data Compression Conference",
pages = "289-298",
year = "1998",
url = "citeseer.ist.psu.edu/teahan98correcting.html" }
Citations (may not include all citations):
3972
Introduction to algorithms (context) - Cormen, Leiserson et al. - 1990
1447
A mathematical theory of communication (context) - Shannon - 1948
368
Text compression (context) - Bell, Cleary et al. - 1990
337
Error bounds for convolutional codes and an asymptotically o.. (context) - Viterbi - 1967
328
A maximum likelihood approach to continuous speech recogniti.. (context) - Bahl, Jelinek - 1983
262
Statistical language learning
- Charniak - 1993
219
A statistical approach to machine translation
- Brown, Cocke et al. - 1990
148
Data compression using adaptive coding and partial string ma..
- Cleary, Witten - 1984
128
Self-organized language modeling for speech recognition (context) - Jelinek - 1990
108
Prediction and entropy of printed English (context) - Shannon - 1951
78
Frequency analysis of English usage: Lexicon and grammar (context) - Francis, Kucera - 1982
61
Unbounded length contexts for PPM
- Cleary, Teahan et al. - 1995
22
Building probabilistic models for natural language
- Chen - 1996
15
USeg: A retargetable word segmentation procedure for informa..
- Ponte, Croft - 1996
15
Modelling English text
- Teahan - 1998
14
The entropy of English using PPM-based models
- Teahan, Cleary - 1996
13
Context-based spelling correction (context) - Mays, Damerau et al. - 1990
6
New techniques for context modeling
- Ristad, Thomas - 1995
4
Compression and Cryptology (context) - Irvine - 1997
1
Jefferson and his time (context) - Malone - 1977
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.waikato.ac.nz/~wjt/): More
The entropy of English using PPM-based models - Teahan, Cleary, Shannon (1996)
(Correct)
Unbounded length contexts for PPM - Cleary, Teahan (1993)
(Correct)
Modelling English Text - Teahan (1998)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC