See this document in CiteSeerX!

Document Centered Approach to Text Normalization  (Make Corrections)  (33 citations)
Andrei Mikheev
Research and Development in Information Retrieval



  Home/Search   Context   Related

 
View or download:
ltg.ed.ac.uk/paper...00mikheevsigir.ps
Cached:  PDF   PS.gz  PS  Image  Update  Help

From:  ltg.ed.ac.uk/papers/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capitalized words when they are used in positions where capitalization is expected, and identi cation of abbreviations. The main feature of our approach is that it uses a minimum of pre-built resources, instead dynamically inferring disambiguation clues from the entire document itself. This makes it domain independent, closely targeted to each... (Update)

Similar documents based on text:   More   All
1.5:   Tagging Sentence Boundaries - Mikheev (2000)   (Correct)
0.5:   Semi-Supervised Maximum Entropy Based Approach to Acronym and.. - Pakhomov (2002)   (Correct)
0.4:   Exploring The Relationship Between Proper Name Anomia And.. - Kay, Hanley, Miles   (Correct)

BibTeX entry:   (Update)

Andrei Mikheev. Document centered approach to text normalization. In Proceedings of SIGIR' http://citeseer.ist.psu.edu/406958.html   More

@inproceedings{ mikheev00document,
    author = "Andrei Mikheev",
    title = "Document centered approach to text normalization",
    booktitle = "Research and Development in Information Retrieval",
    pages = "136-143",
    year = "2000",
    url = "citeseer.ist.psu.edu/406958.html" }
Citations (may not include all citations):
68   A cache-based natural language model for speech recognition (context) - Kuhn, de Mori - 1998
46   One sense per discourse (context) - Gale, Church et al. - 1992
29   Mitre: Description of the alembic system used for muc (context) - Aberdeen, Burger et al. - 1995
29   Language model adaptation using mixtures and an exponentiall.. - Clarkson, Robinson - 1997
20   Automatic rule induction for unknown word guessing - Mikheev - 1997
18   Adaptive multilingual sentence boundary disambiguation - Palmer, Hearst - 1997
13   Identifying unknown proper names in newswire text (context) - Mani, MacMillan - 1995
13   Some applications of tree-based modelling to speech and lang.. (context) - Riley - 1989
9   Eagle: An extensible architecture for general linguistic eng.. - Baldwin, Doran et al. - 1997
7   A knowledge-free method for capitalized word disambiguation - Mikheev - 1999
5   One term or two - Church - 1995
4   Nonlinear interpolation of topic models for language model a.. - Seymore, Chen et al. - 1998
1   Mary Ann Marcinkiewicz and Beatrice Santorini (context) - Marcus - 1993



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.ltg.ed.ac.uk/papers/):   More
Periods, Capitalized Words, etc. - Mikheev (1999)   (Correct)
Feature Lattices and Maximum Entropy Models - Mikheev (1998)   (Correct)
A semantically-derived subset of English for hardware verification - Holt, Klein (1999)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC