Results 1 -
5 of
5
Silent vs Vocalized Articulation for a Portable Ultrasound-Based Silent Speech Interface
- In Proceedings of Interspeech 2010, Makuhari Japan
, 2010
"... Silent Speech Interfaces have been proposed for communication in silent conditions or as a new means of restoring the voice of persons who have undergone a laryngectomy. To operate such a device, the user must articulate silently. Isolated word recognition tests performed with fixed and portable ult ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Silent Speech Interfaces have been proposed for communication in silent conditions or as a new means of restoring the voice of persons who have undergone a laryngectomy. To operate such a device, the user must articulate silently. Isolated word recognition tests performed with fixed and portable ultrasound based silent speech interface equipment show that systems trained on vocalized speech exhibit reduced performance when tested on silent articulation, but that training with silently articulated speech allows to recover much of this loss. Index Terms: silent speech interface, ultrasound, articulation 1.
SYNTHESIZING SPEECH FROM DOPPLER SIGNALS
"... It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker’s face, combined with statistical methods that infer th ..."
Abstract
- Add to MetaCart
(Show Context)
It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker’s face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker – an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech – a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker’s face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising – we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs. Index Terms — Speech synthesis 1.
Recording and Measuring of Jaw Movements using a Computer Vision System
"... Information Helwan University Human motion detection and analysis are important in many medical and dental clinics. Mandibular movements are very complex and difficult to detect by naked eyes. Detecting mandibular movements will aid in proper diagnosis, treatment planning and follow up. Many methods ..."
Abstract
- Add to MetaCart
(Show Context)
Information Helwan University Human motion detection and analysis are important in many medical and dental clinics. Mandibular movements are very complex and difficult to detect by naked eyes. Detecting mandibular movements will aid in proper diagnosis, treatment planning and follow up. Many methods are utilized for measuring mandibular movements. However, most of these methods share the features of being very expensive and difficult to use in the clinic. Using computer vision systems to track such movements may be considered one of the fundamental problems of human motion analysis that may remain unsolved due to its inherent difficulty. However, using markers may greatly simplify the process as long as they are simple, cheap and easy to use. Unlike other tracking systems, this system needs a simple digital video camera, and very simple markers that are created using black-white images that can be stick using any cheap double-sided bonding tape. The proposed system is considered reliable and having a reasonable accuracy. The main advantages in this system are being simple and low cost when compared with any other method having the same accuracy.
Real-time Control of a DNN-based Articulatory Synthesizer for Silent Speech Conversion: a pilot study
"... This article presents a pilot study on the real-time control of an articulatory synthesizer based on deep neural network (DNN), in the context of silent speech interface. The underlying hypothesis is that a silent speaker could benefit from real-time audio feedback to regulate his/her own production ..."
Abstract
- Add to MetaCart
(Show Context)
This article presents a pilot study on the real-time control of an articulatory synthesizer based on deep neural network (DNN), in the context of silent speech interface. The underlying hypothesis is that a silent speaker could benefit from real-time audio feedback to regulate his/her own production. In this study, we use 3D electromagnetic-articulography (EMA) to capture speech articulation, a DNN to convert EMA to spectral trajectories in real-time, and a standard vocoder excited by white noise for audio synthesis. As shown by recent literature on silent speech, adaptation of the articulo-acoustic modeling process is needed to account for possible inconsistencies between the initial training phase and practical usage conditions. In this study, we focus on different sensor setups across sessions (for the same speaker). Model adaptation is performed by cascading another neural network to the DNN used for articulatory-to-acoustic mapping. The intelligibility of the synthetic speech signal converted in real-time is evaluated using both objective and perceptual measurements. Index Terms: articulatory speech synthesis, deep neural networks, EMA, silent speech
unknown title
"... The development of a continuous visual speech recognizer for a silent speech interface has been investigated using a visual speech corpus of ultrasound and video images of the tongue and lips. By using high-speed visual data and tied-state cross-word triphone HMMs, and including syntactic informatio ..."
Abstract
- Add to MetaCart
The development of a continuous visual speech recognizer for a silent speech interface has been investigated using a visual speech corpus of ultrasound and video images of the tongue and lips. By using high-speed visual data and tied-state cross-word triphone HMMs, and including syntactic information via domain-specific language models, word-level recognition accuracy as high as 72 % was achieved on visual speech. Using the Julius system, it was also found that the recognition should be possible in nearly real-time.