(Enter summary)
Abstract: In this paper we present an approach to tackle three
important problems of text normalization: sentence
boundary disambiguation, disambiguation of capitalized
words when they are used in positions where capitalization
is expected, and identication of abbreviations.
The main feature of our approach is that it
uses a minimum of pre-built resources, instead dynamically
inferring disambiguation clues from the entire
document itself. This makes it domain independent,
closely targeted to each... (Update)
Similar documents based on text: More All
1.5: Tagging Sentence Boundaries - Mikheev (2000)
(Correct)
0.5: Semi-Supervised Maximum Entropy Based Approach to Acronym and.. - Pakhomov (2002)
(Correct)
0.4: Exploring The Relationship Between Proper Name Anomia And.. - Kay, Hanley, Miles
(Correct)
BibTeX entry: (Update)
Andrei Mikheev. Document centered approach to text normalization. In Proceedings of SIGIR' http://citeseer.ist.psu.edu/406958.html More
@inproceedings{ mikheev00document,
author = "Andrei Mikheev",
title = "Document centered approach to text normalization",
booktitle = "Research and Development in Information Retrieval",
pages = "136-143",
year = "2000",
url = "citeseer.ist.psu.edu/406958.html" }
Citations (may not include all citations):
68
A cache-based natural language model for speech recognition (context) - Kuhn, de Mori - 1998
46
One sense per discourse (context) - Gale, Church et al. - 1992
29
Mitre: Description of the alembic system used for muc (context) - Aberdeen, Burger et al. - 1995
29
Language model adaptation using mixtures and an exponentiall..
- Clarkson, Robinson - 1997
20
Automatic rule induction for unknown word guessing
- Mikheev - 1997
18
Adaptive multilingual sentence boundary disambiguation
- Palmer, Hearst - 1997
13
Identifying unknown proper names in newswire text (context) - Mani, MacMillan - 1995
13
Some applications of tree-based modelling to speech and lang.. (context) - Riley - 1989
9
Eagle: An extensible architecture for general linguistic eng..
- Baldwin, Doran et al. - 1997
7
A knowledge-free method for capitalized word disambiguation
- Mikheev - 1999
5
One term or two
- Church - 1995
4
Nonlinear interpolation of topic models for language model a..
- Seymore, Chen et al. - 1998
1
Mary Ann Marcinkiewicz and Beatrice Santorini (context) - Marcus - 1993
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.ltg.ed.ac.uk/papers/): More
Periods, Capitalized Words, etc. - Mikheev (1999)
(Correct)
Feature Lattices and Maximum Entropy Models - Mikheev (1998)
(Correct)
A semantically-derived subset of English for hardware verification - Holt, Klein (1999)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC