Results 1 -
7 of
7
Session Independent Non-Audible Speech Recognition Using Surface
, 2005
"... In this paper we introduce a speech recognition system based on myoelectric signals. The system handles audible and non-audible speech. Major challenges in surface electromyography based speech recognition ensue from repositioning electrodes between recording sessions, environmental temperature cha ..."
Abstract
-
Cited by 22 (18 self)
- Add to MetaCart
In this paper we introduce a speech recognition system based on myoelectric signals. The system handles audible and non-audible speech. Major challenges in surface electromyography based speech recognition ensue from repositioning electrodes between recording sessions, environmental temperature changes, and skin tissue properties of the speaker. In order to reduce the impact of these factors, we investigate a variety of signal normalization and model adaptation methods. An average word accuracy of 97.3% is achieved using seven EMG channels and the same electrode positions. The performance drops to 76.2% after repositioning the electrodes if no normalization or adaptation is performed. By applying our adaptation methods we manage to restore the recognition rates to 87.1%. Furthermore, we compare audibly to non-audibly spoken speech. The results suggest that large differences exist between the corresponding muscle movements. Still, our recognition system recognizes both speech manners accurately when trained on pooled data.
Hidden Markov model classification of myoelectric signals in speech
- IEEE Engineering in Medicine and Biology Magazine
, 2002
"... Abstract − A hidden Markov model based classifier is proposed in this paper to perform automatic speech recognition using myoelectric signals from the muscles of vocal articulation. The classifier's resilience to temporal variance is compared to a linear discriminant analysis classifier that was use ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Abstract − A hidden Markov model based classifier is proposed in this paper to perform automatic speech recognition using myoelectric signals from the muscles of vocal articulation. The classifier's resilience to temporal variance is compared to a linear discriminant analysis classifier that was used in a pervious study. Speech recognition was performed, using five channels of myoelectric signals, on isolated words from a 10word vocabulary. Temporal variance was induced by temporally misaligning data from the test set, with respect to the training set. When compared to the LDA classifier, the hidden Markov model classifier demonstrated a markedly lower variation in classification error due to the temporal misalignment. Characteristics of the hidden Markov model MES classifier suggest that it would effectively complement a conventional acoustic speech recognizer, in a multi-modal speech recognition system.
Modeling Coarticulation in EMG-based Continuous Speech Recognition
- Speech Communication Journal
"... This paper discusses the use of surface electromyography for automatic speech recognition. Electromyographic signals captured at the facial muscles record the activity of the human articulatory apparatus and thus allow to trace back a speech signal even if it is spoken silently. Since speech is capt ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
This paper discusses the use of surface electromyography for automatic speech recognition. Electromyographic signals captured at the facial muscles record the activity of the human articulatory apparatus and thus allow to trace back a speech signal even if it is spoken silently. Since speech is captured before it gets airborne, the resulting signal is not masked by ambient noise. The resulting Silent Speech Interface has the potential to overcome major limitations of conventional speech-driven interfaces: it is not prone to any environmental noise, allows to silently transmit confidential information, and does not disturb bystanders. We describe our new approach of phonetic feature bundling for modeling coarticulation in EMG-based speech recognition and report results on the EMG-PIT corpus, a multiple speaker large vocabulary database of silent and audible EMG speech recordings, which we recently collected. Our results on speaker-dependent and speaker-independent setups show that modeling the interdependence of phonetic features reduces the word error rate of the baseline system by over 33 % relative. Our final system achieves 10 % word error rate for the best-recognized speaker on a 101-word vocabulary task, bringing EMG-based speech recognition within a useful range for the application of silent speech interfaces.
A Spectral Mapping Method for EMG-based Recognition of Silent Speech
"... Abstract. This paper reports on our latest study on speech recognition based on surface electromyography (EMG). This technology allows for Silent Speech Interfaces since EMG captures the electrical potentials of the human articulatory muscles rather than the acoustic speech signal. Therefore, our te ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Abstract. This paper reports on our latest study on speech recognition based on surface electromyography (EMG). This technology allows for Silent Speech Interfaces since EMG captures the electrical potentials of the human articulatory muscles rather than the acoustic speech signal. Therefore, our technology enables speech recognition to be applied to silently mouthed speech. Earlier experiments indicate that the EMG signal is greatly impacted by the mode of speaking. In this study we analyze and compare EMG signals from audible, whispered, and silent speech. We quantify the differences and develop a spectral mapping method to compensate for these differences. Finally, we apply the spectral mapping to the front-end of our speech recognition system and show that recognition rates on silent speech improve by up to 12.3 % relative. 1
C Sub Auditory Speech Recognition Based
"... Abstract—Sub-vocal electromyogram/electro palatogram (EMG/EPG) signal classification is demonstrated as a method for silent speech recognition. Recorded electrode signals from the larynx and sublingual areas below the jaw are noise filtered and transformed into features using complex dual quad tree ..."
Abstract
- Add to MetaCart
Abstract—Sub-vocal electromyogram/electro palatogram (EMG/EPG) signal classification is demonstrated as a method for silent speech recognition. Recorded electrode signals from the larynx and sublingual areas below the jaw are noise filtered and transformed into features using complex dual quad tree wavelet transforms. Feature sets for six sub-vocally pronounced words are trained using a trust region scaled conjugate gradient neural network. Real time signals for previously unseen patterns are classified into categories suitable for primitive control of graphic objects. Feature construction, recognition accuracy and an approach for extension of the technique to a variety of real world application areas are presented.
unknown title
"... processing. This chapter presents these and their combination, followed by some related technologies. 3.1 SPEECH PROCESSING Modern speech technology is based on digital signal processing, probabilistic theory and search algorithms. These techniques make it possible to perform significant data reduct ..."
Abstract
- Add to MetaCart
processing. This chapter presents these and their combination, followed by some related technologies. 3.1 SPEECH PROCESSING Modern speech technology is based on digital signal processing, probabilistic theory and search algorithms. These techniques make it possible to perform significant data reduction for coding and transmission of speech signals, speech synthesis and automatic recognition of speech, speaker or language. In this section the state-of-the-art is presented and related to realistic military applications. 3.1.1 Speech Coding When digital systems became available, it was obvious that the transmission of digital signals was more efficient than the transmission of analogue signals. If analogue signals are transmitted under adverse conditions, it is not easy to reconstruct the received signal, because the possible signal values are not known in advance. For digital signals discrete levels are used. This allows, within certain limits, the reconstruction of distorted signals. The first digital transmission systems were based on coding the waveform of the speech signal. This results in bit rates between 8000 to 64000 Bps (bits per second). The higher the bit rate the better the quality. Later, more advanced coding systems were used where basic properties of the speech were determined and encoded, resulting in a more efficient coding (bit rates
INTERSPEECH 2011 Impact of Different Feedback Mechanisms in EMG-based Speech Recognition
"... This paper reports on our recent research in the feedback effects of Silent Speech. Our technology is based on surface electromyography (EMG) which captures the electrical potentials of the human articulatory muscles rather than the acoustic speech signal. While recognition results are good for loud ..."
Abstract
- Add to MetaCart
This paper reports on our recent research in the feedback effects of Silent Speech. Our technology is based on surface electromyography (EMG) which captures the electrical potentials of the human articulatory muscles rather than the acoustic speech signal. While recognition results are good for loudly articulated speech and when experienced users speak silently, novice users usually achieve far worse results when speaking silently. Since there is no acoustic feedback when speaking silently, we investigate different kinds of feedback modes: no additional feedback except the natural somatosensory feedback (like the touching of the lips), visual feedback using a mirror and indirect acoustic feedback by speaking simultaneously to a previously recorded audio signal. In addition we examine recorded EMG data when the subject speaks audibly and silently in a loud environment to see if the Lombard effect can be observed in Silent Speech, too. Index Terms: silent speech, elecromyography, lack of acoustic feedback, EMG-based speech recognition, Lombard effect

