| H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition," Proc. Int. Conf. Acoust. Speech Signal Process. , 2001. |
....from this histogram, and it allows an estimate of the probability for the signal to be clean enough . We name this probability Voicing Index which is derived from the harmonicity index R. First tests of the use of the Voicing Index for the fusion in audio visual speech recognition are reported in [8]. 4. Comparing the Criteria For the comparison we used white noise, noise recorded in a car at 120 km h and, from the NOISEX database, babble noise and two types of factory noise. We mixed these different types of noise to the audio signal at 12 SNR levels ranging from 12dB to clean speech. In ....
H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luetin, "Weighting schemes for audio-visual fusion in speech recognition," in to appear in Proc. ICASSP 2001.
....harmonicity index #, Voicing Index. Similar to the previous criteria the Voicing Index was evaluated only in those segments where the pause was not amongst the 4 most probable phonemes. First tests of the use of the Voicing Index for the fusion in audio visual speech recognition are reported in [36]. 4.2.2 Evaluation of time constant audio stream weights After the definition of the various criteria to be used in the estimation of the reliability of the audio stream the questions at hand are: How sensitive are the recognition results to variations of the fusion parameter # and how ....
H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luttin, "Weighting schemes for audiovisual fusion in speech recognition," in Proc. of ICASSP 2001.
....from the harmonicity index , Voicing Index. Similar to the previous criteria the Voicing Index was only evaluated in segments where the pause was not amongst the 4 most probable phonemes. First tests of the use of the Voicing Index for the fusion in audio visual speech recognition are reported in [35]. 4.2.4 Evaluation of the criteria After the definition of the criteria to estimate the reliability of the audio stream the questions at hand are: How sensitive are the recognition results to variations of the fusion parameter how consistent are the values of the criteria over different noise ....
H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luetin, "Weighting schemes for audio-visual fusion in speech recognition," in Proc. of ICASSP 2001.
No context found.
H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition," Proc. Int. Conf. Acoust. Speech Signal Process. , 2001.
....integration and the degree of asynchrony to be made. We show how these models can be trained jointly using maximum likelihood training and report results for a large vocabulary continuous audio visual speech database. Related work based on feature fusion and decision fusion is presented in [2] and [3], respectively. 2. DATABASE AND RECOGNITION TASK All experiments have been performed on a continuous, large vocabulary, speaker independent database that has been collected at IBM Thomas J. Watson Research Center [2, 4] The database consists of full face frontal video and audio of 290 subjects, ....
H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition, " in Proc. Int. Conf. on Acoustics, Speech and Signal Processing, 2001.
....Decision fusion techniques combine classification decisions based on single modality observations, typically by appropriately weighting their respective log likelihoods. In this paper, we exclusively consider feature fusion. Decision fusion techniques are presented in two accompanying papers [8] [9]. To eliminate duplication, visual feature extraction, the audiovisual database, and the experimental framework, common to all three papers, are described in most detail here. Specifically, in this work, we propose the use of linear discriminant analysis (LDA) 10] to discriminantly reduce the ....
....utterances) from 239 subjects, used for HMM parameter estimation. The held out set of close to 5 hours of data (2,277 utterances) from 25 additional subjects, used for roughly optimizing the language model (LM) weight and the word insertion penalty during lattice rescoring,as well as, in [8] [9], for training parameters relevant to audio visual decision fusion. Finally, the 2.5 hour test set (1,038 utterances) from the 26 remaining subjects is provided for HMM evaluation. Two audio conditions are considered: The original database clean audio (19.5 dB SNR) and a degraded one, where the ....
[Article contains additional citation context not shown here]
H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition," Proc. Int. Conf. Acoust. Speech Signal Process. , 2001.
No context found.
H. Glotin et al., "Weighting schemes for audio-visual fusion in speech recognition," In IEEE Proc. ICASSP, vol. 1, pp. 165 --168, 2001.
No context found.
H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, vol. 1, 2001, pp. 165--168.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC