| Litman, D. J., Hirschberg, J., & Swertz, M. (2000). Predicting Automatic Speech Recognition Performance Using Prosodic Cues. In Proceedings of NAACL, 218-225. |
....from erroneous recognition results does not involve an interactive process with the user but a single utterance verification process inside the system. The idea is to detect the system s misrecognition on the basis of confidence score thresholds modeled with some verbal data [2, 10] Litman et al. [8] have focused on prosodic cues to predict whether a given speech recognition result is correct or misrecognized. They have demonstrated that prosodic features can predict word recognition errors more accurately than traditional acoustic confidence score thresholds. On the other hand, several ....
D. J. Litman, J. B. Hirschberg, and M. Swerts. Predicting automatic speech recognition performance using prosodic cues. In the 1st Meeting of the North American Chapter of the Association for Computational Linuistcs (NAACL-00), pp. 218--225, 2000.
.... they can better confirm or reject the user s input [11] or, when many errors have occurred, change their interaction strategy [7] In previous research we investigated the importance of a variety of prosodic and other cues to the automatic detection of misrecognitions in spoken dialogue systems [2, 5]. The data examined was obtained from subjects performing specified train information gathering tasks with TOOT, an experimental phone based spoken dialogue system [6] TOOT was implemented on a platform developed at AT T combining ASR, text to speech, a phone interface, a finite state dialogue ....
D. J. Litman, J. B. Hirschberg, and M. Swerts. Predicting automatic speech recognition performance using prosodic cues. In Proceedings of the First Annual Meeting, Seattle, May 2000. North American Association for Computational Linguistics (NAACL-00).
....incorrect recognition it is .44. This increase is probably caused by the fact that the speakers used hyperarticulate speech when they noticed that the system had a problem recognizing their previous utterance. Similar findings are reported in a number of studies, such as Shriberg et al. 1992) and Litman et al. 2000). This implies that it might be beneficial to switch to a speech recognizer trained on hyperarticulate speech if there are communication problems (cf. Hirschberg et al. 1999) Acknowledgments Thanks are due to Antal van den Bosch, Olga van Herwijnen, Stephen Isard, Esther Klabbers, Elizabeth ....
Litman, D., Hirschberg, J., Swerts, M., 2000. Predicting automatic speech recognition performance using prosodic cues. In: Proceedings of the First Meeting of the North-American Chapter for Computional Linguistics (NAACL'00), Seattle, Washington.
....role prosody may play in both detecting automatic speech recognition (ASR) errors and in helping to understand user corrections of such errors. In two different corpora of human machine interactions, we found that prosodic features can be used to detect recognition errors with considerable accuracy[2, 8, 3]: in combination with information already available to the recognizer, such as acoustic confidence scores, grammar and recognized string, they can distinguish speaker turns that are misrecognized far better than traditional methods for ASR rejection using acoustic confidence scores alone. In the ....
....explanation accounts for only 59 of misrecognized corrections in the corpus. There are still a large number of misrecognized corrections that show no perceptual evidence of hyperarticulation. In our earlier analysis of prosodic differences between correct and incorrectly recognized turns [8], we also found that misrecognized turns differed from correctly recognized turns in f0, loudness, duration, and timing all features associated with hyperarticulation. And more misrecognitions are hyperarticulated than are correctly recognized turns. But when we excluded perceptually ....
D. J. Litman, J. B. Hirschberg, and M. Swerts. Predicting automatic speech recognition performance using prosodic cues. Procs. NAACL-00.
....the impact of using a sliding window rather than all the utterances since the last adaptation to compute predictedMisrecs . Finally, while our current focus is on predicting and adapting to problems at the (sub)dialogue level, we would like to apply our approach at the utterance level (Levow 1998; Litman, Hirschberg, Swerts 2000; van Zanten 1999; Smith 1998; Chu Carroll 2000) Acknowledgments We would like to thank Owen Rambow for commenting on an earlier version of this paper, and Sandra Carberry and Janyce Wiebe for their help in recruiting subjects. We especially thank the students at Columbia University, New Mexico ....
Litman, D.; Hirschberg, J.; and Swerts, M. 2000. Predicting automatic speech recognition performance using prosodic cues. In Proc. 1st Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
No context found.
Litman, D. J., Hirschberg, J., & Swertz, M. (2000). Predicting Automatic Speech Recognition Performance Using Prosodic Cues. In Proceedings of NAACL, 218-225.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC