Results 1 - 10
of
10
Audio-Visual Speech Recognition using Red Exclusion and Neural Networks
- Journal of Research and Practice in Information Technology
, 2003
"... Automatic speech recognition (ASR) performs well under re- stricted conditions, but performance degrades in noisy envi- ronments. Audio-Visual Speech Recognition (AVSR) combats this by incorporating a visual signal into the recognition. This paper briefly reviews the contribution of psycholinguistic ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Automatic speech recognition (ASR) performs well under re- stricted conditions, but performance degrades in noisy envi- ronments. Audio-Visual Speech Recognition (AVSR) combats this by incorporating a visual signal into the recognition. This paper briefly reviews the contribution of psycholinguistics to this endeavour and the recent advances in machine AVSR. An important first step in AVSR is that of feature extraction from the mouth region and a technique developed by the authors is breifiy presented. This paper examines examine how useful this extraction technique in combination with several integration arhitectures is at the given task, demonstrates that vision does infact assist speech recognition when used in a linguistically guided fashion, and gives insight remaining issues.
Recognition of visual speech elements using adaptively boosted HMM
- International Machine Vision and Image Processing Conference 0-7695-2887-2/07 $25.00 © 2007 IEEE DOI 10.1109/IMVIP.2007.35
, 2004
"... © 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other w ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
© 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Profile view lip reading
- In IEEE International Conference on Acoustics, Speech and Signal Processing, number 4
, 2007
"... In this paper, we introduce profile view (PV) lip reading, a scheme for speaker-dependent isolated word speech recognition. We provide historic motivation for PV from the importance of profile images in facial animation for lip reading, and we present feature extraction schemes for PV as well as for ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we introduce profile view (PV) lip reading, a scheme for speaker-dependent isolated word speech recognition. We provide historic motivation for PV from the importance of profile images in facial animation for lip reading, and we present feature extraction schemes for PV as well as for the traditional frontal view (FV) approach. We compare lip reading results for PV and FV, which demonstrate a significant improvement for PV over FV. We show improvement in speech recognition with the integration of Audio and Visual features. We also found it advantageous to process the visual features over a longer duration than the duration marked by the endpoints of the speech utterance. Index Terms — Speechreading, Visual feature extraction, Audiovisual speech recognition, Profile view
Vowel Recognition using Neural Networks
"... Speech recognition techniques have been developed dramatically in recent years. Nevertheless, errors caused by environmental noise are still a serious problem in recognition. Employing algorithms to detect and follow the motion of lips have been widely used to improve the performance of speech recog ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Speech recognition techniques have been developed dramatically in recent years. Nevertheless, errors caused by environmental noise are still a serious problem in recognition. Employing algorithms to detect and follow the motion of lips have been widely used to improve the performance of speech recognition algorithms. This paper presents a novel technique to recognize vowels. Lip features extracted by using a combined method are used as input parameters to a neural network system for recognition. Accuracy of the proposed method is verified by using it to recognize 6 main Farsi vowels. Key words: Vowel recognition, visual features, neural networks
EURASIP Journal on Applied Signal Processing 2005:9, 1382–1399 c ○ 2005 Hindawi Publishing Corporation A Two-Channel Training Algorithm for Hidden Markov Model and Its Application to Lip Reading
, 2004
"... Hidden Markov model (HMM) has been a popular mathematical approach for sequence classification such as speech recognition since 1980s. In this paper, a novel two-channel training strategy is proposed for discriminative training of HMM. For the proposed training strategy, a novel separable-distance f ..."
Abstract
- Add to MetaCart
(Show Context)
Hidden Markov model (HMM) has been a popular mathematical approach for sequence classification such as speech recognition since 1980s. In this paper, a novel two-channel training strategy is proposed for discriminative training of HMM. For the proposed training strategy, a novel separable-distance function that measures the difference between a pair of training samples is adopted as the criterion function. The symbol emission matrix of an HMM is split into two channels: a static channel to maintain the validity of the HMM and a dynamic channel that is modified to maximize the separable distance. The parameters of the two-channel HMM are estimated by iterative application of expectation-maximization (EM) operations. As an example of the application of the novel approach, a hierarchical speaker-dependent visual speech recognition system is trained using the two-channel HMMs. Results of experiments on identifying a group of confusable visemes indicate that the proposed approach is able to increase the recognition accuracy by an average of 20 % compared with the conventional HMMs that are trained with the Baum-Welch estimation.
unknown title
"... In this paper, we introduce profile view (PV) lip reading, a scheme for speaker-dependent isolated word speech recognition. We provide historic motivation for PV from the importance of profile images in facial animation for lip reading, and we present feature extraction schemes for PV as well as for ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, we introduce profile view (PV) lip reading, a scheme for speaker-dependent isolated word speech recognition. We provide historic motivation for PV from the importance of profile images in facial animation for lip reading, and we present feature extraction schemes for PV as well as for the traditional frontal view (FV) approach. We compare lip reading results for PV and FV, which demonstrate a significant improvement for PV over FV. We show improvement in speech recognition with the integration of Audio and Visual features. We also found it advantageous to process the visual features over a longer duration than the duration marked by the endpoints of the speech utterance. Index Terms — Speechreading, Visual feature extraction, Audiovisual speech recognition, Profile view
A Multifaceted Investigation into Speech Reading
"... Speech reading is the act of speech perception using both acoustic and visual information. This is something we all do as humans and can be utilised by machines to improve traditional speech recognition systems. We have been following a line of research that started with a simple audio-visual speech ..."
Abstract
- Add to MetaCart
(Show Context)
Speech reading is the act of speech perception using both acoustic and visual information. This is something we all do as humans and can be utilised by machines to improve traditional speech recognition systems. We have been following a line of research that started with a simple audio-visual speech recognition system to what is now a multifaceted investigation into speech reading. This paper overviews our feature extraction technique, red exclusion, and its analysis using neural networks and then looks at several neural network integration architectures for speech reading.
Lip Contour Extraction Based on Support Vector Machine1
"... This paper is supported by NSFC. No. 10674013 Lip contour extraction was a useful technique for obtaining a mouth shape in an image, and was one of the most important techniques for human-machine interface applications, such as lip reading and speech recognition. In this paper, a new method to extra ..."
Abstract
- Add to MetaCart
(Show Context)
This paper is supported by NSFC. No. 10674013 Lip contour extraction was a useful technique for obtaining a mouth shape in an image, and was one of the most important techniques for human-machine interface applications, such as lip reading and speech recognition. In this paper, a new method to extract lip contour from video was propose based on the fact that the lip color and skin-color was varied in the different color spaces. We first extracted frames from digital video first; then we classified face into lip area and non-lip area by the Support Vector Machine. At last we obtained some parameters from the lip area to reconstruct the lip contour. The experiment results proved that the proposed method was accurate and robust. Keywords:
A Novel Visual Feature Extraction and Its Application in Vowel Recognition
"... Speech recognition techniques have been developed dramatically in recent years. Nevertheless, errors caused by environmental noise are still a serious problem in recognition. Employing algorithms to detect and follow the motion of lips have been widely used to improve the performance of speech recog ..."
Abstract
- Add to MetaCart
(Show Context)
Speech recognition techniques have been developed dramatically in recent years. Nevertheless, errors caused by environmental noise are still a serious problem in recognition. Employing algorithms to detect and follow the motion of lips have been widely used to improve the performance of speech recognition algorithms. This paper presents a novel technique to recognize vowels. Lip features extracted by using a combined method are used as input parameters to a neural network system for recognition. Accuracy of the proposed method is verified by using it to recognize 6 main Farsi vowels. 1.