| Benoit Maison, Chalapathy Neti, and Andrew Senior, "Audio-visual speaker recognition for video broadcast news: some fusion techniques, " in IEEE Multimedia Signal Processing (MMSP99), Denmark, September 1999. |
.... intent to speak, humans use visual speech cues to better understand speech[2] And there have been some success in integrating visual cues in an ASR system [3, 4, 5] Joint processing of audio and visual information have been used successfully in speaker change, speaker identification etc as well[6, 7]. In this paper, we propose to use the visual channel for establishment of speech intent. One can argue that rather than using the visual channel, a user can explicitly address the device with a unique name to establish intent. This has a couple of obvious problems. As the number of devices ....
Benoit Maison, Chalapathy Neti, and Andrew Senior, "Audio-visual speaker recognition for video broadcast news: some fusion techniques, " in IEEE Multimedia Signal Processing (MMSP99), Denmark, September 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC