Results 1 - 10
of
11
Speech repairs, intonational phrases and discourse markers: modeling speakers’ utterances in spoken dialogue
- Computational Linguistics
, 1999
"... Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Eve ..."
Abstract
-
Cited by 61 (9 self)
- Add to MetaCart
Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speech repairs, which occur where speakers go back and change (or repeat) something they just said. The words that are replaced or repeated are no longer part of the intended utterance, and so need to be identified. Segmenting turns and resolving repairs are strongly intertwined with a third task: identifying discourse markers. Because of the interactions, and interactions with POS tagging and speech recognition, we need to address these tasks together and early on in the processing stream. This paper presents a statistical language model in which we redefine the speech recognition problem so that it includes the identification of POS tags, discourse markers, speech repairs and intonational phrases. By solving these simultaneously, we obtain better results on each task than addressing them separately. Our model is able to identify 72 % of turn-internal intonational boundaries with a precision of 71%, 97 % of discourse markers with 96 % precision, and detect and correct 66 % of repairs with 74 % precision.
Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers
- Department of Computer Science, University of Rochester
, 1997
"... Peter Heeman was born October 22, 1963, and much to his dismay his parents had already moved away from Toronto. Instead he was born in London Ontario, where he grew up on a strawberry farm. He attended the University of Waterloo where he re-ceived a Bachelors of Mathematics with a joint degree in Pu ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
Peter Heeman was born October 22, 1963, and much to his dismay his parents had already moved away from Toronto. Instead he was born in London Ontario, where he grew up on a strawberry farm. He attended the University of Waterloo where he re-ceived a Bachelors of Mathematics with a joint degree in Pure Mathematics and Com-puter Science in the spring of 1987. After working two years for a software engineering company, which supposedly used artificial intelligence techniques to automate COBOL and CICS programming, Peter was ready for a change. What better way to wipe the slate clear than by going to graduate school at the University of Toronto, but not without first spending the sum-mer in Europe. After spending two months in countries where he couldn’t speak the language, Peter became fascinated by language, and so decided to give computational linguistics a try.
A syntactic framework for speech repairs and other disruptions
- In Proceedings of the 37 th Annual Meeting of the Association for Computational Linguistics
, 1999
"... This paper presents a grammatical and processing framework for handling the repairs, hesitations, and other interruptions in natural human dialog. The proposed framework has proved adequate for a collection of human-human task-oriented dialogs, both in a full manual examination of the corpus, and in ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This paper presents a grammatical and processing framework for handling the repairs, hesitations, and other interruptions in natural human dialog. The proposed framework has proved adequate for a collection of human-human task-oriented dialogs, both in a full manual examination of the corpus, and in tests with a parser capable of parsing some of that corpus. This parser can also correct a pre-parser speech repair identifier resulting in a 4.8 % increase in recall. 1
Speech repairs: A parsing perspective
- In Satellite meeting ICPHS 99
, 1999
"... This paper presents a grammatical and processing framework for handling speech repairs. The proposed framework has proved adequate for a collection of human-human task-oriented dialogs, both in a full manual examination of the corpus, and in tests with a parser capable of parsing some of that corpus ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
This paper presents a grammatical and processing framework for handling speech repairs. The proposed framework has proved adequate for a collection of human-human task-oriented dialogs, both in a full manual examination of the corpus, and in tests with a parser capable of parsing some of that corpus. This parser can also correct a pre-parser speech repair identifier producing increases in recall varying from 2 % to 4.8%. 1.
Improving And Predicting Performance Of Statistical Language Models In Sparse Domains
, 1998
"... Standard statistical language models, or n-gram models, which represent the probability of word sequences, suffer from sparse-data problems in tasks where large amounts of domain-specific text are not available. This thesis focuses on improving the estimation of domain-dependent n-gram models by usi ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Standard statistical language models, or n-gram models, which represent the probability of word sequences, suffer from sparse-data problems in tasks where large amounts of domain-specific text are not available. This thesis focuses on improving the estimation of domain-dependent n-gram models by using out-of-domain text data. Previous approaches for estimating language models from multi-domain data have not accounted for the characteristic variations of style and content across domains. In contrast, this thesis introduces two approaches that compensate for multi-domain differences, both representing "style" by part-of-speech (POS) sequences and "content" by the particular choice of words. First, data from multiple domains is combined using similarity weighting schemes that discriminate for content and style relevance prior to pooling multi-domain text. Second, n-gram distributions from multiple domains are combined, via a POS-dependent n-gram framework that separately compensate for word and POS usage differences. Two variations are explored: explicitly transforming the out-of-domain distribution before combining with an in-domain model, and vi separately estimating components of the POS-dependent n-gram model using multidomain data. Finally, measures to analyze and predict recognition performance of language models are also investigated, resulting in an algorithm for predicting performance differences associated with localized changes in language models given a recognition system.
Modeling Filled Pauses in Medical Dictations
, 1999
"... Filled pauses are characteristic of spontaneous speech and can present considerable problems for speech recognition by being often recognized as short words. An um can be recognized as thumb or arm if the recognizer's language model does not adequately represent FP's. Recognition of quasi-spontaneou ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Filled pauses are characteristic of spontaneous speech and can present considerable problems for speech recognition by being often recognized as short words. An um can be recognized as thumb or arm if the recognizer's language model does not adequately represent FP's. Recognition of quasi-spontaneous speech (medical dictation) is subject to this problem as well. Restfits from medical dictations by 21 family practice physicians show that using an FP model trained on the corpus populated with FP's produces overall better restfits than a model trained on a corpus that excluded FP's or a corpus that had random FP's.
LM Studies on Filled Pauses in Spontaneous Medical Dictation
- IN: PROC. HUMAN LANGUAGE TECHNOLOGY CONFERENCE/NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS ANNUAL MEETING
, 2003
"... We investigate the optimal LM treatment of abundant filled pauses (FP) in spontaneous monologues of a professional dictation task. QUestions ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We investigate the optimal LM treatment of abundant filled pauses (FP) in spontaneous monologues of a professional dictation task. QUestions
Adding robustness to language models for spontaneous speech recognition
- in Proc. ISCA Workshop on Robustness Issues in Conversational Interaction
"... Compared to dictation systems, recognition systems for spontaneous speech still perform rather poorly. An important weakness in these systems is the statistical language model, mainly due to the lack of large amounts of stylistically matching training data and to the occurrence of disfluencies in th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Compared to dictation systems, recognition systems for spontaneous speech still perform rather poorly. An important weakness in these systems is the statistical language model, mainly due to the lack of large amounts of stylistically matching training data and to the occurrence of disfluencies in the recognition input. In this paper we investigate a method for improving the robustness of a spontaneous language model by flexible manipulation of the prediction context when disfluencies occur. In the case of repetitions, we obtained significantly better recognition results on a benchmark Switchboard test set. 1.

