MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Experiments on sentence boundary detection (2000) [11 citations — 0 self]

Download:
Download as a PDF | Download as a PS
by Mark Stevenson, Robert Gaizauskas
In Proceedings of the 6th ANLP
http://www.dcs.shef.ac.uk/~marks/publications/rmem1.ps
Add To MetaCart

Abstract:

This paper explores the problem of identifying sentence boundaries in the transcriptions produced by automatic speech recognition systems. An experiment which determines the level of human performance for this task is described as well as a memory-based computational approach to the problem. 1 The Problem This paper addresses the problem of identifying sentence boundaries in the transcriptions produced by automatic speech recognition (ASR) systems. This is unusual in the field of text processing which has generally dealt with well-punctuated text: some of the most commonly used texts in NLP are machine readable versions of highly edited documents such as newspaper articles or novels. However, there are many types of text which are not so-edited and the example which we concentrate on in this paper is the output from ASR systems. These differ from the sort of texts normally used in NLP in a number of ways; the text is generally in single case (usually upper), unpunctuated and may contain transcription errors. 1 Figure 1 compares a short text in the format which would be produced by an ASR system with a fully punctuated version which includes case information. For the remainder of this paper error-free texts such as newspaper articles or novels shall be referred to as "standard text " and the output from a speech recognition system as "ASR text".

Citations

369 A Simple Rule-Based Part of Speech Tagger – Brill
367 Assessing agreement on classifica-tion tasks: the kappa statistic – Carletta - 1996
198 TiMBL: Tilburg Memory Based Learner, version 5.0, Reference Guide – Daelemans, Zavrel, et al. - 2003
115 A Maximum Entropy Approach to Identifying Sentence Boundaries – Reynar, Ratnaparkhi - 1997
47 Nonparametric Statistics for the Behavioural Sciences – Siegel, Castellan - 1988
42 den Bosch. TiMBL: Tilburg Memory Based Learner, version 2.0, reference manual – Daelemans, Zavrel, et al. - 1999
41 CommandTalk: A Spoken-Language Interface for Battlefield Simulations – MOORE, DOWDING, et al. - 1997
37 Adaptive sentence boundary disambiguation – Palmer, Hearst - 1994
36 Good-Turing frequency estimation without tears – Gale, Sampson - 1995
35 Users reference guide for the British – Burnard - 1995
34 Survey of the state of the art in human language technology. Available at HLTsurvey.html> Denaux, R., et al. 2005. An approach for ontology-based elicitation of user models to enable personalization on the Semantic Web. Available at p1170.pdf> Digital Lib – Cole - 1996
29 Feature Lattices for Maximum Entropy Modelling – Mikheev - 1998
17 Combining weak knowledge sources for sense disambiguation – Stevenson, Wilks - 1999
14 Cyberpunc: A lightweight punctuation annotation system for speech – Beeferman, Berger, et al. - 1998
14 Information extraction from broadcast news – Gotoh, Renals - 2000
14 Hub-4 named entity task definition (version 4.8 – Chinchor, Robinson, et al. - 1998
10 Survey of the State of the Art – COLE - 1995
6 Matching words to senses in WordNet: Naive vs. expert differentiation of senses – Fellbaum, Grabowski, et al. - 1998
1 Users Reference Guide for the M. Stevenson and – Burnard - 1995
1 Assessing agreement on classificConference on – Carletta - 1996
1 CommandTalk: A Spokcaa-Language Interface to Battlefield Simulations – Moore, Dowding, et al. - 1997