| E. Shriberg and A. Stolcke. A prosody-only decision-tree model for disfluency detection. In Proceedings of the European Conference on Speech Communication and Technology, pages 2383--2386, 1997. |
.... the structure of a message and play an important role in distinguishing dialogue acts (Jurafsky et al. 1998) In addition, prosody can aid the segmentation of a long speech recording into topics and sentences (Hakkani Tur et al. 1999) and help locate speech disfluencies for improved parsing (Shriberg et al. 1997; Stolcke et al. 1998) Prosody can also be used to explore robust dialogue strategies. The dialogue system can infer the speaker s emotion from verbal and prosodic cues in order to react properly. This is particularly useful during user system error resolution, because a speaker tends to ....
Shriberg, E., R. Bates, and A. Stolcke (1997). A prosody-only decision-tree model for disfluency detection. In Proc. EUROSPEECH'97, Rhodes, Greece, pp. 2383--2386.
....AND CONCLUSIONS In this paper we surveyed data obtained for the purpose of the evaluation of different versions of a dialogue system. It mainly dealt with the prosody of corrections in such an environment. As for the practical application of these findings [3] argued, much in parallel to [8], that the automatic identification of special utterance types (CRE and CME in this case) on the basis of a prosodic analysis can be used for triggering the selection of different recognition modules. Using a WOZ simulation for the data acquisition phase allowed for a highly controlled task ....
Shriberg E., Bates R., Stolcke A.: A Prosody Only Decision-Tree Model for Disfluency Detection, in 5th European Conference on Speech Communication and Technology (EUROSPEECH 97), Rhodes, Greece, ESCA, Vol.5,pp.2383-2386, 1997.
....repairs. Four classes of features are identified to be relevant for repair detection: ffl acoustic prosodic cues ffl word fragments ffl editing terms ffl syntactic semantic anomalies Until now most approaches concentrate on one of the topics above. Nakatani Hirschberg [6] and Shriberg et al. [8], for instance, investigate acoustic prosodic cues. Bear et al. 2] use editing terms and pattern matching as triggers 2 Taking up the terminology of Heeman [4] a repair with an empty reparandum is called abridged repair. From a psychological point of view a repair could in fact take place in the ....
Elizabeth Shriberg, Rebecca Bates, and Andreas Stolcke. A prosody-only decision-tree model for disfluency detection. In Proc. EUROSPEECH '97, volume 5, pages 2383--2386, Rhodes, Greece, September 1997.
....which are incorrect. However, clause P3 of the rule serves to prevent many of these, as disfluencies tend to occur early in an utterance. Interestingly, Shriberg et al. have shown that the discrimination between disfluent and fluent speech can be detected fairly well using prosody alone (Shriberg et al. 1997). Third, low pitch regions in Japanese often occur with agreement seeking sentence final particles , especially ne (you know) which is the word most frequently appearing with low pitch regions. Fourth, a low pitch region often occurs together with back channel feedback itself. In the corpora ....
Shriberg, E., Bates, R., and Stolcke, A. (1997). A prosody-only decision-tree model for disfluency detection. In Eurospeech97, pages 2383--2386.
....a better understanding of the prosodic properties of the different DAs, which can in turn be applied in building better integrated models for natural speech corpora in general. Our approach builds on recent methodology that has achieved good success on conversational speech for a different task (Shriberg et al. 1997). The method involves construction of a large database of automatically extracted acousticprosodic features. In training, decision tree classifiers are inferred from the features; the trees are then applied to an unseen set of data to evaluate performance. We apply the trees to four ....
SHRIBERG, ELIZABETH, REBECCA BATES, and ANDREAS STOLCKE. 1997. A prosody-only decision-tree model for disfluency detection. EUROSPEECH-97, volume 5, 2383--2386, Rhodes, Greece.
No context found.
E. Shriberg, R. Bates, and A. Stolcke, "A prosody-only decisiontree model for disfluency detection," in Proc. Eurospeech, 1997, pp. 2383--2386.
No context found.
E. Shriberg, R. Bates, and A. Stolcke, "A prosody-only decisiontree model for disfluency detection," in Proc. Eurospeech, 1997, pp. 2383--2386.
....Word LM Prosody 56.76 81.25 98.10 POS LM terior probability generated by the prosody model is not very high, implying the prosodic features are not sufficiently reliable to overtake the low prior probability of an IP event. Hence the final decision is always non IP . Experiments in [18] have shown that useful prosodic cues exist at the interruption points, but the performance of the prosody model was not investigated on non downsampled data in that research. An analysis of the results shows that most of the interruption points correctly detected by LMs are repetitions. It is ....
E. Shriberg, R. Bates, and A. Stolcke, "A prosody-only decisiontree model for disfluency detection," in Proc. Eurospeech, 1997, pp. 2383--2386.
....access to features extracted from speech up to the point of interest. Both punctuation and overlap have been discussed in the literature as correlating with prosodic cues. For example, past computational work has discussed prosodic features for sentence boundaries as well as disfluency boundaries [3, 14, 15, 6]. Past work in conversation analysis, discourse analysis, and linguistics has shown prosody to be a useful cue in turn taking behavior [7, 8, 9, 10] Such studies suggest a potential contribution from prosody for our tasks, but to our knowledge the tasks have not yet been explored within a ....
E. Shriberg, R. Bates, and A. Stolcke. A prosody-only decisiontree model for disfluency detection. In G. Kokkinakis, N. Fakotakis, and E. Dermatas, editors, Proc. EUROSPEECH, vol. 5, pp. 2383--2386, Rhodes, Greece, 1997.
....prosodic information for information extraction. The general framework for combining lexical and prosodic cues for tagging speech with various kinds of hidden structural information is a further development of our earlier work on detecting sentence boundaries and disfluencies in spontaneous speech [16, 14, 12, 15]. 2. PROSODIC MODELING 2.1. Data For all tasks, the prosodic model used a wide range of features that were automatically extracted from about 70 hours (700 thousand words) of the Linguistic Data Consortium (LDC) 1997 Broadcast News (BN) corpus. Sentence boundaries were automatically determined ....
....as some of the prosodic features depend on the phonetic alignment of the word models. We can thus expect the prosodic model estimates to be robust to recognition errors. 2.3. Feature Selection We started with a large collection of features capturing the two major aspects of speech prosody, as in [12]: ffl Duration: of pauses, final vowel and final rhymes, normalized both for phone durations and speaker statistics ffl Pitch: F0 patterns, preceding the boundary, across the boundary, and pitch range relative to the speaker s baseline We included features that, based on the descriptive ....
E. Shriberg, R. Bates, and A. Stolcke. A prosody-only decision-tree model for disfluency detection. In G. Kokkinakis, N. Fakotakis, and E. Dermatas, editors, Proc. EUROSPEECH, vol. 5, pp. 2383--2386, Rhodes, Greece, 1997.
....it Deletion DEL 1.3 it was he liked it Repair OthDF 1.2 he she liked it Else fluent else 81.8 she liked it Table 1. Boundary and disfluency event classes. 3. HIDDEN EVENTS The present work builds on our previous research on modeling hidden events for the purpose of automatic detection [12, 15]. Hidden events can be viewed as tags that label the type of boundary between adjacent words. We used the sentence boundary and disfluency event classes from [15] in our models, shown in Table 1 with examples and frequencies in the corpus we used for experiments. 3.1. Prior Work The hidden event ....
....prosodic features such as pause, duration, and pitch. Hidden events are thus suitable candidates for the kind of hidden structure needed to leverage prosody as a knowledge source for word recognition. Prosodic cues have been studied mainly for the purpose of automatic detection of disfluencies [10, 12] and sentence boundaries [8] The correlation between hidden events and word cues has likewise been exploited, for detecting both sentence boundaries [14, 8] and disfluencies [2, 6, among others] although recent work has also shown that speech language models can be improved by incorporating ....
E. Shriberg, R. Bates, and A. Stolcke. A prosody-only decision-tree model for disfluency detection. In G. Kokkinakis, N. Fakotakis, and E. Dermatas, editors, Proc. EUROSPEECH, vol. 5, pp. 2383--2386, Rhodes, Greece, 1997.
....repair. Disfluency is thus often indicated by unfilled pauses in the editing phase. For automatic speech processing of disfluencies, these pauses have proven to be very useful. Work using decision trees to model acoustic features finds that pauses are among the best cues to disfluency detection [20, 21], because they are robustly extracted and ensure high recall. 4.2. Filled Pause Duration In English, the vowel in the filled pauses um and uh is typically close to schwa; however, it can also carry stress, or occur further back and lower in the vowel space. In automatic speechrecognition, ....
Shriberg, E., Bates, R., and Stolcke, A. 1997. A prosody-only decision-tree model for disfluency detection. In Proceedings of the 5th EuropeanConferenceon SpeechCommunication andTechnology, vol. 5, pp. 2383--2386.
....the location and extent of disfluencies (including self repairs) so that a speaker s intended meaning can be inferred. We will refer to sentence boundaries and disfluencies collectively as our target events. Prior work on utterance boundary detection [8, 12] as well as on disfluency detection [5, 10] has addressed this problem, but not in a completely realistic framework. Previous work has assumed either a correct word sequence, or knowledge of the word boundaries. In reality, word information is not known, but has to be hypothesized using a speech recognizer. This renders word based cues ....
....of individual events E i in E are estimated, not E as a whole. This is both convenient and legitimate, since the overall classification error is minimized by maximizing the posterior of each E i independently. 2.3. Prosodic Model As in prior work on disfluency and sentence boundary detection [8, 10], we trained CART style decision trees [2] to predict event classes from local properties at the word boundary of interest. However, we use the trained tree models not simply as classifiers that output the most likely class, but as probability estimators P T (E i jF;W ) to be combined with the ....
E. Shriberg, R. Bates, and A. Stolcke. A prosody-only decision-tree model for disfluency detection. In G. Kokkinakis, N. Fakotakis, and E. Dermatas, editors, Proc. EUROSPEECH, vol. 5, pp. 2383--2386, Rhodes, Greece, 1997.
....this study and in relation to other work. The general framework for combining lexical and prosodic cues for tagging speech with various kinds of hidden structural information is a further development of our earlier work on sentence segmentation and disfluency detection for spontaneous speech [10, 12, 13]. 2. Approach Topic segmentation in the paradigm used by us and others [15] proceeds in two phases. In the first phase, the input is divided into contiguous strings of words assumed to belong to one topic each. We refer to this step as chopping . For example, in textual input, the natural units ....
....TDT paradigm. News (BN) corpus. Topic boundary information determined by human labelers was extracted from the markup accompanying the word transcripts of this corpus. We started with a large set of prosodic features capturing various durational and intonational aspects of speech prosody, as in [10]. We included features that, based on descriptive literature, we believed should reflect breaks in the temporal and intonational contour. We developed versions of such features that could be defined at each inter word boundary, and which could be extracted by completely automatic means (no human ....
E. Shriberg, R. Bates, and A. Stolcke. A prosody-only decision-tree model for disfluency detection. In G. Kokkinakis, N. Fakotakis, and E. Dermatas, editors, Proc. EUROSPEECH, vol. 5, pp. 2383--2386, Rhodes, Greece, 1997.
....completions, where one speaker finishes the other s utterance. Finally, language modeling of segment and turn boundaries should be accompanied by an explicit modeling of the prosodic features of such events. We have started work on the combined prosodic and language modeling of hidden events [11], which we plan to extended for the purpose of N best rescoring. ....
E. Shriberg, R. Bates, and A. Stolcke. A prosody-only decision-tree model for disfluency detection. In Proceedings 5th European Conference on Speech Communication and Technology, Rhodes, Greece, 1997.
No context found.
E. Shriberg and A. Stolcke. A prosody-only decision-tree model for disfluency detection. In Proceedings of the European Conference on Speech Communication and Technology, pages 2383--2386, 1997.
No context found.
Shriberg, E., Bates, R., and Stolcke, A. (1997). A prosody-only decision-tree model for disfluency detection. In Eurospeech97, pages 2383--2386.
No context found.
Shriberg, E., R. Bates, and A. Stolcke (1997). A prosody-only decision-tree model for disfluency detection. In Proc. EUROSPEECH'97, Rhodes, Greece, pp. 2383--2386.
No context found.
Shriberg E., Bates R., Stolcke A.: A Prosody Only Decision-Tree Model for Disfluency Detection, in 5th European Conference on Speech Communication and Technology (EUROSPEECH 97), Rhodes, Greece, ESCA, Vol.5,pp.2383-2386, 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC