Results 1 - 10
of
30
Speech repairs, intonational phrases and discourse markers: modeling speakers’ utterances in spoken dialogue
- Computational Linguistics
, 1999
"... Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Eve ..."
Abstract
-
Cited by 61 (9 self)
- Add to MetaCart
Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speech repairs, which occur where speakers go back and change (or repeat) something they just said. The words that are replaced or repeated are no longer part of the intended utterance, and so need to be identified. Segmenting turns and resolving repairs are strongly intertwined with a third task: identifying discourse markers. Because of the interactions, and interactions with POS tagging and speech recognition, we need to address these tasks together and early on in the processing stream. This paper presents a statistical language model in which we redefine the speech recognition problem so that it includes the identification of POS tags, discourse markers, speech repairs and intonational phrases. By solving these simultaneously, we obtain better results on each task than addressing them separately. Our model is able to identify 72 % of turn-internal intonational boundaries with a precision of 71%, 97 % of discourse markers with 96 % precision, and detect and correct 66 % of repairs with 74 % precision.
Detecting and Correcting Speech Repairs
, 1994
"... Interactive spoken dialog provides many new challenges for spoken language systems. One of the most critical is the prevalence of speech repairs. This paper presents an algorithm that detects and corrects speech repairs based on finding the repair pattern. The repair pattern is built by finding word ..."
Abstract
-
Cited by 59 (13 self)
- Add to MetaCart
Interactive spoken dialog provides many new challenges for spoken language systems. One of the most critical is the prevalence of speech repairs. This paper presents an algorithm that detects and corrects speech repairs based on finding the repair pattern. The repair pattern is built by finding word matches and word replacements, and identifying fragments and editing terms. Rather than using a set of prebuilt templates, we build the pattern on the fly. In a fair test, our method, when combined with a statistical model to filter possible repairs, was successful at detecting and correcting 80 % of the repairs, without using prosodic information or a parser.
Resolving Pronominal Reference to Abstract Entities
, 2002
"... Entities Donna K. Byron Department of Computer Science P.O. Box 270226 Rochester, NY 14627 dbyron@cs.rochester.edu Abstract This paper describes PHORA, a technique for resolving pronominal reference to either indi- vidual or abstract entities. It defines processes for evoking abstract refe ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
Entities Donna K. Byron Department of Computer Science P.O. Box 270226 Rochester, NY 14627 dbyron@cs.rochester.edu Abstract This paper describes PHORA, a technique for resolving pronominal reference to either indi- vidual or abstract entities. It defines processes for evoking abstract referents from discourse and for resolving both demonstrative and personal pronouns. It successfully interprets 72% of test pronouns, compared to 37% for a lead- ing technique without these features.
Intonational Boundaries, Speech Repairs and Discourse Markers: Modeling Spoken Dialog
, 1997
"... To understand a speaker's turn of a conversation, one needs to segment it into intonational phrases, clean up any speech repairs that might have occurred, and identify discourse markers. In this paper, we argue that these problems must be resolved together, and that they must be resolved earl ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
To understand a speaker's turn of a conversation, one needs to segment it into intonational phrases, clean up any speech repairs that might have occurred, and identify discourse markers. In this paper, we argue that these problems must be resolved together, and that they must be resolved early in the processing stream. We put forward a statistical language model that resolves these problems, does POS tagging, and can be used as the language model of a speech recognizer. We find that by accounting for the interactions between these tasks that the performance on each task improves, as does POS tagging and perplexity.
The Computational Processing of Intonational Prominence: A Functional Prosody Perspective
, 1997
"... Intonational prominence, or accent, is a fundamental prosodic feature that is said to contribute to discourse meaning. This thesis outlines a new, computational theory of the discourse interpretation of prominence, from a FUNCTIONAL PROSODY perspective. Functional prosody makes the following two imp ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Intonational prominence, or accent, is a fundamental prosodic feature that is said to contribute to discourse meaning. This thesis outlines a new, computational theory of the discourse interpretation of prominence, from a FUNCTIONAL PROSODY perspective. Functional prosody makes the following two important assumptions: first, there is an aspect of prominence interpretation that centrally concerns discourse processes, namely the discourse focusing nature of prominence; and second, the role of prominence in language processing in general, and discourse processing in particular, is not essentially separate from the processing of other grammatical, nonprosodic information. This thesis develops a computational theory of prominence interpretation by explaining how prominence serves as an inference cue in discourse processing. Prominence signals changes in the attentional status of entities in a discourse model, while nonprominence signals that the realized entities are already in discourse fo...
Identifying Discourse Markers in Spoken Dialog
, 1998
"... In this paper, we present a method for identifying discourse marker usage in spontaneous speech based on machine learning. Discourse markers are denoted by special POS tags, and thus the process of POS tagging can be used to identify discourse markers. By incorporating POS tagging into language ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this paper, we present a method for identifying discourse marker usage in spontaneous speech based on machine learning. Discourse markers are denoted by special POS tags, and thus the process of POS tagging can be used to identify discourse markers. By incorporating POS tagging into language modeling, discourse markers can be identified during speech recognition, in which the timeliness of the information can be used to help predict the following words. We contrast this approach with an alternative machine learning approach proposed by Litman (1996). This paper also argues that discourse markers can be used to help the hearer predict the role that the upcoming utterance plays in the dialog. Thus discourse markers should provide valuable evidence for automatic dialog act prediction. Introduction Discourse markers are a linguistic devise that speakers use to signal how the upcoming unit of speech or text relates to the current discourse state (Schiffrin 1987). Previous ...
Utilizing visual attention for cross-modal coreference interpretation
- In Proceedings of Fifth International and Interdisciplinary Conference on Modeling and Using Context (CONTEXT-05
, 2005
"... Abstract. In this paper, we describe an exploratory study to develop a model of visual attention that could aid automatic interpretation of exophors in situated dialog. The model is intended to support the reference resolution needs of embodied conversational agents, such as graphical avatars and ro ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Abstract. In this paper, we describe an exploratory study to develop a model of visual attention that could aid automatic interpretation of exophors in situated dialog. The model is intended to support the reference resolution needs of embodied conversational agents, such as graphical avatars and robotic collaborators. The model tracks the attentional state of one dialog participant as it is represented by his visual input stream, taking into account the recency, exposure time, and visual distinctness of each viewed item. The model correctly predicts the correct referent of 52 % of referring expressions produced by speakers in human-human dialog while they were collaborating on a task in a virtual world. This accuracy is comparable with reference resolution based on calculating linguistic salience for the same data. 1
Discourse marker use in task-oriented spoken dialog
- The Proceedings of Eurospeech 97
, 1997
"... Discourse markers, also known as cue words, are used extensively in human-human task-oriented dialogs to signal the structure of the discourse. Previous work showed their importance in monologues for marking discourse structure, but little attention has been paid to their importance in spoken dialog ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Discourse markers, also known as cue words, are used extensively in human-human task-oriented dialogs to signal the structure of the discourse. Previous work showed their importance in monologues for marking discourse structure, but little attention has been paid to their importance in spoken dialog systems. This paper investigates what discourse markers signal about the upcoming speech, and when they tend to be used in task-oriented dialog. We demonstrate that there is a high correlation between specific discourse markers and specific conversational moves, between discourse marker use and adjacency pairs, and between discourse markers and the speaker’s orientation to information presented in the prior turn. 1
POS Tagging versus Classes in Language Modeling
, 1998
"... Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. The use of POS t ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. The use of POS tags allows more sophisticated generalizations than are afforded by using a class-based approach. Furthermore, if we want to incorporate speech repair and intonational phrase modeling into the language model, using POS tags rather than classes gives .bet- ter performance in this task.
POS Tags and Decision Trees for Language Modeling
- IN PROCEEDINGS OF THE JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA
, 1999
"... Language model's for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. To use POS tags ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Language model's for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. To use POS tags effectively, we use clustering and decision tree algorithms, which allow generalizations between POS tags and words to be effectively used in estimating the probability distributions. We show that our POS model gives.a reduction in word error rate and perplexity for the Trains corpus in comparison to word and class-based approaches. By using the Wall Street Journal corpus, we show that this approach scales up when more training data is available.

