• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Reaching over the Gap: A Review of Efforts to Link Human and Automatic Speech Recognition Research (2007)

by O Scharenborg
Venue:Speech Communications
Add To MetaCart

Tools

Sorted by:
Results 1 - 4 of 4

Comparing Human and Machine Recognition Performance on a VCV Corpus

by Odette Scharenborg, Martin Cooke
"... Listeners outperform ASR systems in every speech recognition task. However, what is not clear is where this human advantage originates. This paper investigates the role of acoustic feature representations. We test four (MFCCs, PLPs, Mel Filterbanks, Rate Maps) acoustic representations, with and with ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Listeners outperform ASR systems in every speech recognition task. However, what is not clear is where this human advantage originates. This paper investigates the role of acoustic feature representations. We test four (MFCCs, PLPs, Mel Filterbanks, Rate Maps) acoustic representations, with and without ‘pitch ’ information, using the same backend. The results are compared with listener results at the level of articulatory feature classification. While no acoustic feature representation reached the levels of human performance, both MFCCs and Rate maps achieved good scores, with Rate maps nearing human performance on the classification of voicing. Comparing the results on the most difficult articulatory features to classify showed similarities between the humans and the SVMs: e.g., ‘dental ’ was by far the least well identified by both groups. Overall, adding pitch information seemed to hamper classification performance. Index Terms: human-machine comparison, acoustic feature representations, articulatory feature classification.

The Interspeech 2008 Consonant Challenge

by Martin Cooke, Odette Scharenborg
"... Listeners outperform automatic speech recognition systems at every level, including the very basic level of consonant identification. What is not clear is where the human advantage originates. Does the fault lie in the acoustic representations of speech or in the recognizer architecture, or in a lac ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Listeners outperform automatic speech recognition systems at every level, including the very basic level of consonant identification. What is not clear is where the human advantage originates. Does the fault lie in the acoustic representations of speech or in the recognizer architecture, or in a lack of compatibility between the two? Many insights can be gained by carrying out a detailed human-machine comparison. The purpose of the Interspeech 2008 Consonant Challenge is to promote focused comparisons on a task involving intervocalic consonant identification in noise, with all participants using the same training and test data. This paper describes the Challenge, listener results and baseline ASR performance. Index Terms: consonant perception, VCV, humanmachine performance comparisons

DETECTING OFF-TASK SPEECH

by Wei Chen
"... Off-task speech is speech that strays away from an intended task. It occurs in many dialog applications, such as intelligent tutors, virtual games, health communication systems and humanrobot cooperation. Off-task speech input to computers presents both challenges and opportunities for such dialog s ..."
Abstract - Add to MetaCart
Off-task speech is speech that strays away from an intended task. It occurs in many dialog applications, such as intelligent tutors, virtual games, health communication systems and humanrobot cooperation. Off-task speech input to computers presents both challenges and opportunities for such dialog systems. On the one hand, off-task speech contains informal conversational style and potentially unbounded scope that hamper accurate speech recognition. On the other hand, an automated agent capable of detecting off-task speech could track users’ attention and thereby maintain the intended conversation by bringing a user back on task; also, knowledge of where off-task speech events are likely to occur can help the analysis of automatic speech recognition (ASR) errors. Related work has been done in confidence measures for dialog systems and detecting out-of-domain utterances. However, there is a lack of systematic study on the type of off-task speech being detected and generality of features capturing off-task speech. In addition, we know of no published research on detecting off-task speech in children’s interactions with an automated agent. The goal of this research is to fill in these blanks to provide a systematic study of off-task speech, with an emphasis on child-machine interactions. To characterize off-task speech quantitatively, we used acoustic features to capture its

In Language and Information Technologies

by Wei Chen, Wei Chen
"... Off-task speech is speech that strays away from an intended task. It occurs in many dialog applications, such as intelligent tutors, virtual games, health communication systems and humanrobot cooperation. Off-task speech input to computers presents both challenges and opportunities for such dialog s ..."
Abstract - Add to MetaCart
Off-task speech is speech that strays away from an intended task. It occurs in many dialog applications, such as intelligent tutors, virtual games, health communication systems and humanrobot cooperation. Off-task speech input to computers presents both challenges and opportunities for such dialog systems. On the one hand, off-task speech contains informal conversational style and potentially unbounded scope that hamper accurate speech recognition. On the other hand, an automated agent capable of detecting off-task speech could track users’ attention and thereby maintain the intended conversation by bringing a user back on task; also, knowledge of where off-task speech events are likely to occur can help the analysis of automatic speech recognition (ASR) errors. Related work has been done in confidence measures for dialog systems and detecting out-of-domain utterances. However, there is a lack of systematic study on the type of off-task speech being detected and generality of features capturing off-task speech. In addition, we know of no published research on detecting off-task speech in children’s interactions with an automated agent. The goal of this research is to fill in these blanks to provide a systematic study of off-task speech, with an emphasis on child-machine interactions. To characterize off-task speech quantitatively, we used acoustic features to capture its
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University