18 citations found. Retrieving documents...
Hirschberg, J., D. Litman, and M. Swerts (1999). Prosodic cues to recognition errors. In Proceedings 1999 IEEE Workshop on Automatic Speech Recognition and Understanding, Keystone, USA.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Characterizing and Processing Robot-Directed Speech - Varchavskaia, Fitzpatrick.. (2001)   (Correct)

....to consist of single word utterances. This hypothesis will undergo a preliminary evaluation here. Enunciated speech and vocal shaping The tendency of humans to slow down and overarticulate their utterances when they meet with misunderstanding has been reported as a problem in the ASR community [12]. Such enunciated speech degrades considerably the performance of speech recognition systems which were trained on natural speech only. If we find that human caretakers tend to address Kismet with overarticulated speech, its presence becomes a problem to be addressed by the robot s perceptual ....

....for the speech recognizer, but at the same time opens new possibilities. For an improved word learning interface, it may be possible to descriminate between natural and enunciated speech to detect instances of pronunciation teaching (this approach was taken in the ASR community, for example in [12]) On the other hand, the strategy of vocal shaping was not clearly present in the interactions, and there were few cases of mimicry. More (and better) research would determine how reliable or not these features of Kismet directed speech may be. We plan in the future to conduct much more ....

J. Hirschberg, D. Litman, and M. Swerts. Prosodic cues to recognition errors. In ASRU.


Characterizing and Processing Robot-Directed Speech - Fitzpatrick, Varchavskaia.. (2001)   (Correct)

....of single word utterances. This hypothesis will undergo a preliminary evaluation in Section 4. Enunciated speech and vocal shaping The tendency of humans to slow down and overarticulate their utterances when they meet with misunderstanding has been reported as a problem in the ASR community [12]. Such enunciated speech degrades considerably the performance of speech recognition systems which were trained on natural speech only. If we find that human caretakers tend to address Kismet with overarticulated speech, its presence becomes an important issue to be addressed by the robot s ....

....for the speech recognizer, but at the same time opens new possibilities. For an improved word learning interface, it may be possible to discriminate between natural and enunciated speech to detect instances of pronunciation teaching (this approach was taken in the ASR community, for example in [12]) On the other hand, the strategy of vocal shaping was not clearly present in the interactions, and there were few cases of mimicry. Having completed this exploratory study, we now plan to follow up the results with more tightly controlled experiments specifically designed to elucidate the ....

J. Hirschberg, D. Litman, and M. Swerts. Prosodic cues to recognition errors. In ASRU, 1999.


Whence and Whither Prosody in Automatic Speech.. - Batliner, Nöth.. (2001)   (2 citations)  (Correct)

....that: Within the two languages, the impact of PCs is very similar for boundaries and accents. Across the two languages, there is a similar impact of PCs. The order of relevance is: first comes duration, then energy, then pause, then F0. Similar results were obtained in other studies, cf. [30, 13], especially, as far as the impact of duration features is concerned. In [4, 1] we try to interpret this ranking. Such an approach that represents the state of the art nowadays can be characterized as a sort of shot gun approach: a highly redundant feature set is used, and by that, chances ....

.... strategy is thus the same for the core linguistic phenomena boundaries and accents on the one hand, and for the paralinguistic phenomena emotion user state on the other hand: not to use prosody alone but to combine it with several other knowledge sources; as for work along comparable lines, cf. [13, 19, 20, 26, 27]. The different scores are weighted, similar to the LM weight used in speech recognition. We use an automatic procedure based on gradient descent for optimization, cf. 37] user independent training user adaptive training (mimic) prosody repetitions re formulations dialogue act ....

J. Hirschberg, D. Litman, and M. Swerts. Prosodic cues to recognition errors. In Proceddings of the Automatic Speech Recognition and Understanding Workshop (ASRU'99), pages 349--352, 1999.


Prosodic Modeling for Improved Speech Recognition and Understanding - Wang (2001)   (Correct)

....erroneous (and confusing) responses, the dialogue system can request clarification from the user, give more detailed instructions, or take more control of the interaction. Such accept reject decisions in the confidence scoring framework can also potentially be improved by using prosodic cues (Hirschberg et al. 1999; Hirschberg et al. 2000) 1 A recognizer N best list consists of the N top scoring sentence hypotheses for an input utterance. 21 1.1.3 Di#culties in Modeling Prosody Prosodic modeling for speech recognition and understanding applications is a di#cult problem due to many factors (Sagisaka et ....

....to other reasons such as out of vocabulary words, noise interference, etc. The dialogue system can take proper action, such as rejecting the utterance or requesting confirmation, to signal the user of system di#culty and guide the user in error correction (Hazen et al. 2000a; Hazen et al. 2000b) Hirschberg et al. 1999, 2000) have found that prosodic features are correlated with recognition performance, and prosodic information can be used to improve the rejection of incorrect recognition hypotheses. A related issue is to detect the emotional state of a user, because humans sometimes 26 extend their ....

[Article contains additional citation context not shown here]

Hirschberg, J., D. Litman, and M. Swerts (1999). Prosodic cues to recognition errors. In Proceedings 1999 IEEE Workshop on Automatic Speech Recognition and Understanding, Keystone, USA.


Generalizing Prosodic Prediction Of Speech Recognition Errors - Hirschberg, Litman, Swerts (2000)   (3 citations)  Self-citation (Hirschberg Litman Swerts)   (Correct)

.... they can better confirm or reject the user s input [11] or, when many errors have occurred, change their interaction strategy [7] In previous research we investigated the importance of a variety of prosodic and other cues to the automatic detection of misrecognitions in spoken dialogue systems [2, 5]. The data examined was obtained from subjects performing specified train information gathering tasks with TOOT, an experimental phone based spoken dialogue system [6] TOOT was implemented on a platform developed at AT T combining ASR, text to speech, a phone interface, a finite state dialogue ....

J. Hirschberg, D. Litman, and M. Swerts. Prosodic cues to recognition errors. In Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU'99), 1999.


Detecting problematic turns in human-machine.. - van den Bosch.. (2001)   (1 citation)  Self-citation (Swerts)   (Correct)

....the sequence of system question types and the word graphs corresponding to the respective user utterances. A word graph is a lattice of word hypotheses, and we conjecture that various features which have been shown to cue communication problems (prosodic, linguistic and ASR features, see e.g. Hirschberg et al. 1999, Krahmer et al. 1999 and Swerts et al. 2000) have correlates in the word graph. The sequence of system question types is taken to model the dialogue history. Finally, 3) to gain further insight into the adequacy of various machine learning techniques for problem detection we use both RIPPER and ....

....with an accuracy of about 65 . Although this is somewhat above the baseline of 58 decision accuracy when no problems would be predicted, signalling recognition problems with word graph features and previous system question types as predictors is a hard task. As other studies suggest (e.g. Hirschberg et al. 1999), confidence scores and acoustic prosodic features could be of help. The second approach tested whether the word 8 In the sense that it is not a perfect image of the users input. graph for the current user utterance and or the recent history of system question types could be employed to predict ....

Hirschberg, J., Litman, D., Swerts, M. (1999), Prosodic cues to recognition errors, Proc. ASRU, Keystone, CO.


Detecting problematic turns in human-machine.. - van den Bosch.. (2001)   (1 citation)  Self-citation (Swerts)   (Correct)

....the sequence of system question types and the word graphs corresponding to the respective user utterances. A word graph is a lattice of word hypotheses, and we conjecture that various features which have been shown to cue communication problems (prosodic, linguistic and ASR features, see e.g. Hirschberg et al. 1999, Krahmer et al. 1999 and Swerts et al. 2000) have correlates in the word graph. The sequence of system question types is taken to model the dialogue history. Finally, 3) to gain further insight into the adequacy of various machine learn ing techniques for problem detection we use both RIPPER ....

....with an accuracy of about 65 . Although this is somewhat above the baseline of 58 decision accuracy when no problems would be predicted, signalling recognition problems with word graph features and previous system question types as predictors is a hard task. As other studies suggest (e.g. Hirschberg et al. 1999), confidence scores and acoustic prosodic features could be of help. The second approach tested whether the word graph for the current user utterance and or the recent history of system question types could be employed to predict whether the previous user 8 In the sense that it is not a perfect ....

Hirschberg, J., Litman, D., Swerts, M. (1999), Prosodic cues to recognition errors, Proc. ASRU, Keystone, CO.


Error Detection in Spoken Human-Machine Interaction - Krahmer, Swerts, Theune.. (1989)   (6 citations)  Self-citation (Swerts)   (Correct)

.... and correct recognitions (see, e.g. Bouwman, Sturm and Boves, 1999) Other research has shown that prosodic cues and lexical information from the recognized strings in addition to acoustic confidence measures may also help to distinguish erroneous utterances from correct ones (e.g. Hirschberg, Litman and Swerts, 1999), but, again, these do not completely eliminate misunderstandings. 1 It should be noted that both these strategies are only concerned with recognition errors, whereas communication problems may also be due to other factors, e.g. because the system may make wrong default assumptions. For ....

Hirschberg, J., Litman, D. & Swerts, M. (1999). Prosodic cues to recognition errors. Proceedings of the 1999 International Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp. 349-352). Keystone, CO, December 1999.


On the Use of Prosody for On-line Evaluation of Spoken.. - Swerts, Krahmer (2000)   Self-citation (Swerts)   (Correct)

.... is probably caused by the fact that speakers use hyperarticulate speech when they notice that the system had a problem recognizing their previous utterance, thus it might be beneficial to switch to a speech recognizer trained on hyperarticulate speech if there are communication problems (cf. Hirschberg et al. 1999). Acknowledgments Thanks are due to Antal van den Bosch, Olga van Herwijnen, Esther Klabbers, Mariet Theune and Mieke Weegels. We would like to thank Elizabeth Shriberg for urging us to do the perceptual experiment. Swerts is also affiliated with UIA and with the FWO Flanders. ....

Hirschberg, J., Litman, D., Swerts, M., 1999. Prosodic cues to recognition errors. In: Proceedings of the International Workshop on Speech Recognition and Understanding (ASRU-99), Keystone, CO, USA.


The Dual of Denial: Two Uses of Disconfirmations in.. - Krahmer, Swerts.. (2002)   (1 citation)  Self-citation (Swerts)   (Correct)

....recognizing their previous utterance. Similar findings are reported in a number of studies, such as Shriberg et al. 1992) and Litman et al. 2000) This implies that it might be beneficial to switch to a speech recognizer trained on hyperarticulate speech if there are communication problems (cf. Hirschberg et al. 1999). Acknowledgments Thanks are due to Antal van den Bosch, Olga van Herwijnen, Stephen Isard, Esther Klabbers, Elizabeth Shriberg, Jacques Terken, and an anonymous referee, as well as to the audience of the ESCA Workshop on Dialogue and Prosody (Veldhoven, 1999) The authors are mentioned in ....

Hirschberg, J., Litman, D., Swerts, M., 1999. Prosodic cues to recognition errors. In: Proceedings of the International Workshop on Speech Recognition and Understanding (ASRU-99), Keystone, CO, USA.


Towards Adaptive Spoken Dialogue Agents - Litman (2000)   Self-citation (Litman)   (Correct)

.... of adaptive TOOT approached that of useradaptable TOOT (Litman Pan 2000) Finally, we have successfully extended our problematic situation detection methodology (Litman, Walker, Kearns 1999) to apply to two new prediction tasks (predicting misrecognized utterances in TOOT using prosody (Hirschberg, Litman, Swerts 1999), and predicting dialogues in which callers were or should have been transferred to a human operator in a call routing dialogue system (Langkilde et al. 1999) However, we have not yet used these learned prediction rules to trigger or evaluate the utility of adaptation. Adaptation over Time ....

Hirschberg, J.; Litman, D.; and Swerts, M. 1999. Prosodic cues to recognition errors. In Proceedings of the Automatic Speech Recognition and Understanding Workshop.


Corrections In Spoken Dialogue Systems - Swerts, Litman, Hirschberg (2000)   (4 citations)  Self-citation (Hirschberg Litman Swerts)   (Correct)

....role prosody may play in both detecting automatic speech recognition (ASR) errors and in helping to understand user corrections of such errors. In two different corpora of human machine interactions, we found that prosodic features can be used to detect recognition errors with considerable accuracy[2, 8, 3]: in combination with information already available to the recognizer, such as acoustic confidence scores, grammar and recognized string, they can distinguish speaker turns that are misrecognized far better than traditional methods for ASR rejection using acoustic confidence scores alone. In the ....

J. Hirschberg, D. Litman, and M. Swerts. Prosodic cues to recognition errors. Procs. ASRU-99.


Predicting Automatic Speech Recognition Performance.. - Litman, Hirschberg.. (2000)   (2 citations)  Self-citation (Hirschberg Litman Swerts)   (Correct)

No context found.

Julia Hirschberg, Diane Litman, and Marc Swerts. 1999. Prosodic cues to recognition errors. In Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU'99).


Prosodic Modeling for Improved Speech Recognition and Understanding - Wang (2001)   (Correct)

No context found.

Hirschberg, J., D. Litman, and M. Swerts (1999). Prosodic cues to recognition errors. In Proceedings 1999 IEEE Workshop on Automatic Speech Recognition and Understanding, Keystone, USA.


Prosody and Automatic Speech Recognition - Why not yet a.. - Batliner, Nöth (2003)   (Correct)

No context found.

J. Hirschberg, D. Litman, and M. Swerts. Prosodic cues to recognition errors. In Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU'99), pages 349--352, 1999.


How to Find Trouble in Communication - Batliner, Fischer, Huber, Spilker, .. (2003)   (5 citations)  (Correct)

No context found.

Hirschberg, J., Litman, D., and Swerts, M. (1999). Prosodic cues to recognition errors. In Proceddings of the Automatic Speech Recognition and Understanding Workshop (ASRU'99), pages 349--352.


Acoustic Models for Hyperarticulated Speech - Soltau, Waibel (2000)   (Correct)

No context found.

J. Hirschberg, D. Litman, and M. Swerts. Prosodic cues to recognition errors. In Proceedings of the Automatic Speech Recognition and Understanding Workshop, Keystone, USA, 1999.


Predicting Automatic Speech Recognition Performance.. - Hirschberg, Litman.. (2000)   (2 citations)  (Correct)

No context found.

Julia Hirschberg, Diane Litman, and Marc Swerts. 1999. Prosodic cues to recognition errors. In Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU'99).

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC