• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Visuo-Phonetic Decoding using Multi-Stream and ContextDependent Models for an Ultrasound-based Silent Speech Interface (2009)

by T Hueber, G Chollet, B Denby, G Dreyfus, M Stone
Venue:in Interspeech
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Silent vs Vocalized Articulation for a Portable Ultrasound-Based Silent Speech Interface

by Victoria-m. Florescu, Lise Crevier-buchman, Bruce Denby, Thomas Hueber, Simon Claire Pillot-loiseau Pierre Roussel - In Proceedings of Interspeech 2010, Makuhari Japan , 2010
"... Silent Speech Interfaces have been proposed for communication in silent conditions or as a new means of restoring the voice of persons who have undergone a laryngectomy. To operate such a device, the user must articulate silently. Isolated word recognition tests performed with fixed and portable ult ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
Silent Speech Interfaces have been proposed for communication in silent conditions or as a new means of restoring the voice of persons who have undergone a laryngectomy. To operate such a device, the user must articulate silently. Isolated word recognition tests performed with fixed and portable ultrasound based silent speech interface equipment show that systems trained on vocalized speech exhibit reduced performance when tested on silent articulation, but that training with silently articulated speech allows to recover much of this loss. Index Terms: silent speech interface, ultrasound, articulation 1.
(Show Context)

Citation Context

...ng of the tongue and lips usingsa portable ultrasound machine and a video camera is proposed.sAlthough a promising continuous speech phone recognitionsrate of 70% on an English corpus was reported in =-=[3]-=-, [4], twoscritical experimental issues remain to be addressed:s! The speaker’s head remained immobilized in ansacquisition system fixed to a table. Clearly, a practicablesSSI will have to be portable...

SYNTHESIZING SPEECH FROM DOPPLER SIGNALS

by Arthur R. Toth, Kaustubh Kalgaonkar, Bhiksha Raj, Tony Ezzat
"... It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker’s face, combined with statistical methods that infer th ..."
Abstract - Add to MetaCart
It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker’s face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker – an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech – a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker’s face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising – we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs. Index Terms — Speech synthesis 1.
(Show Context)

Citation Context

...ing the talker in the act of speaking. Commonly, the act of observing a talker has been interpreted as one of capturing images of the talker’s face. The video may then be decoded into a speech signal =-=[1]-=-. More commonly, observation of speech has been performed with tethered devices, such as electro-myelo graphs (EMG) [2], electromagnetic articulographs (EMA), or even sensors to detect brain activity....

Recording and Measuring of Jaw Movements using a Computer Vision System

by Mahmoud Sedky Adly, Aliaa A. A. Youssif, Ahmed Sharaf Eldin
"... Information Helwan University Human motion detection and analysis are important in many medical and dental clinics. Mandibular movements are very complex and difficult to detect by naked eyes. Detecting mandibular movements will aid in proper diagnosis, treatment planning and follow up. Many methods ..."
Abstract - Add to MetaCart
Information Helwan University Human motion detection and analysis are important in many medical and dental clinics. Mandibular movements are very complex and difficult to detect by naked eyes. Detecting mandibular movements will aid in proper diagnosis, treatment planning and follow up. Many methods are utilized for measuring mandibular movements. However, most of these methods share the features of being very expensive and difficult to use in the clinic. Using computer vision systems to track such movements may be considered one of the fundamental problems of human motion analysis that may remain unsolved due to its inherent difficulty. However, using markers may greatly simplify the process as long as they are simple, cheap and easy to use. Unlike other tracking systems, this system needs a simple digital video camera, and very simple markers that are created using black-white images that can be stick using any cheap double-sided bonding tape. The proposed system is considered reliable and having a reasonable accuracy. The main advantages in this system are being simple and low cost when compared with any other method having the same accuracy.
(Show Context)

Citation Context

... return. We can then calculate a distance from the time and the known speed of sound. Unfortunately it had the disadvantage of being inaccurate and extremely sensitive to the environmental conditions =-=[11, 12, 13]-=-. All of the previous methods excluding the graphical method are sharing the features of being very expensive and difficult to use in common clinical scenarios. This work is offering a simple, low cos...

Real-time Control of a DNN-based Articulatory Synthesizer for Silent Speech Conversion: a pilot study

by Florent Bocquelet, Thomas Hueber, Laurent Girin, Christophe Savariaux, Blaise Yvert, Inserm Clinatec U
"... This article presents a pilot study on the real-time control of an articulatory synthesizer based on deep neural network (DNN), in the context of silent speech interface. The underlying hypothesis is that a silent speaker could benefit from real-time audio feedback to regulate his/her own production ..."
Abstract - Add to MetaCart
This article presents a pilot study on the real-time control of an articulatory synthesizer based on deep neural network (DNN), in the context of silent speech interface. The underlying hypothesis is that a silent speaker could benefit from real-time audio feedback to regulate his/her own production. In this study, we use 3D electromagnetic-articulography (EMA) to capture speech articulation, a DNN to convert EMA to spectral trajectories in real-time, and a standard vocoder excited by white noise for audio synthesis. As shown by recent literature on silent speech, adaptation of the articulo-acoustic modeling process is needed to account for possible inconsistencies between the initial training phase and practical usage conditions. In this study, we focus on different sensor setups across sessions (for the same speaker). Model adaptation is performed by cascading another neural network to the DNN used for articulatory-to-acoustic mapping. The intelligibility of the synthetic speech signal converted in real-time is evaluated using both objective and perceptual measurements. Index Terms: articulatory speech synthesis, deep neural networks, EMA, silent speech
(Show Context)

Citation Context

... microphone [5]. Several studies have addressed thesproblem of ‘silent speech recognition’, i.e. word sequencesidentification from silent articulation, under differentsmodalities including ultrasound =-=[6]-=-, sEMG [7], NAM [8], orsPEMA [9]. Other studies have addressed the problem of ‘silentsspeech conversion’, i.e. direct reconstruction of a syntheticsspeech signal from silent articulation, without any ...

unknown title

by Jun Cai, Thomas Hueber, Bruce Denby, Gérard Chollet, Pierre Roussel, Gérard Dreyfus, Lise Crevier-buchman
"... The development of a continuous visual speech recognizer for a silent speech interface has been investigated using a visual speech corpus of ultrasound and video images of the tongue and lips. By using high-speed visual data and tied-state cross-word triphone HMMs, and including syntactic informatio ..."
Abstract - Add to MetaCart
The development of a continuous visual speech recognizer for a silent speech interface has been investigated using a visual speech corpus of ultrasound and video images of the tongue and lips. By using high-speed visual data and tied-state cross-word triphone HMMs, and including syntactic information via domain-specific language models, word-level recognition accuracy as high as 72 % was achieved on visual speech. Using the Julius system, it was also found that the recognition should be possible in nearly real-time.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University