Results 1 -
2 of
2
Impact of Different Speaking Modes on EMG-based Speech Recognition
"... We present our recent results on speech recognition by surface electromyography (EMG), which captures the electric potentials that are generated by the human articulatory muscles. This technique can be used to enable Silent Speech Interfaces, since EMG signals are generated even when people only art ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present our recent results on speech recognition by surface electromyography (EMG), which captures the electric potentials that are generated by the human articulatory muscles. This technique can be used to enable Silent Speech Interfaces, since EMG signals are generated even when people only articulate speech without producing any sound. Preliminary experiments have shown that the EMG signals created by audible and silent speech are quite distinct. In this paper we first compare various methods of initializing a silent speech EMG recognizer, showing that the performance of the recognizer substantially varies across different speakers. Based on this, we analyze EMG signals from audible and silent speech, present first results on how discrepancies between these speaking modes affect EMG recognizers, and suggest areas for future work. Index Terms: speech recognition, surface electromyography, silent speech, articulation
ESTIMATION OF FUNDAMENTAL FREQUENCY FROM SURFACE ELECTROMYOGRAPHIC DATA: EMG-TO-F0
"... In this paper, we present our recent studies of F0 estimation from the surface electromyographic (EMG) data using a Gaussian mixture model (GMM)-based voice conversion (VC) technique, referred to as EMG-to-F0. In our approach, a support vector machine recognizes individual frames as unvoiced and voi ..."
Abstract
- Add to MetaCart
In this paper, we present our recent studies of F0 estimation from the surface electromyographic (EMG) data using a Gaussian mixture model (GMM)-based voice conversion (VC) technique, referred to as EMG-to-F0. In our approach, a support vector machine recognizes individual frames as unvoiced and voiced (U/V), and voiced F0 contours are discriminated by the trained GMM based on the manner of minimum mean-square error. EMG-to-F0 is experimentally evaluated using three data sets of different speakers. Each data set includes almost 500 utterances. Objective experiments demonstrate that we achieve a correlation coefficient of up to 0.49 between estimated and target F0 contours with more than 84 % U/V decision accuracy, although the results have large variations.

