Results 1 - 10
of
21
Audio-Visual Integration In Multimodal Communication
- Proc. IEEE
, 1998
"... : In this paper, we review recent research that examines audio-visual integration in multimodal communication. The topics include bimodality in human speech, human and automated lip-reading, facial animation, lip synchronization, joint audio-video coding, and bimodal speaker verification. We also st ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
: In this paper, we review recent research that examines audio-visual integration in multimodal communication. The topics include bimodality in human speech, human and automated lip-reading, facial animation, lip synchronization, joint audio-video coding, and bimodal speaker verification. We also study the enabling technologies for these research topics, including automatic facial feature tracking and audio-to-visual mapping. Recent progress in audio-visual research shows that joint processing of audio and video provides advantages that are not available when the audio and video are processed independently. Keywords: Multimedia communication, Speech processing, Speech communication, Video signal processing, Image analysis 1. Introduction Multimedia is more than simply the combination of various forms of data: text, speech, audio, music, images, graphics, and video. When we discuss multimedia signal processing, it is the integration and interaction among these different media types t...
Combining Evidence in Personal Identity Verification Systems
- Pattern Recognition Letters
, 1997
"... A methodology for fusing multiple instances of biometric data to improve the performance of a personal identity verification system is developed. The fusion problem is formulated in the framework of the Bayesian estimation theory. The effect of different fusion strategies on the error probability is ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
A methodology for fusing multiple instances of biometric data to improve the performance of a personal identity verification system is developed. The fusion problem is formulated in the framework of the Bayesian estimation theory. The effect of different fusion strategies on the error probability is analysed theoretically. The proposed methodology is then demonstrated on the problem of personal identity verification using multiple facial images. Experimental studies on the M2VTS database confirm the predicted improvements in performance. A reduction in error rates of up to 40% is achieved. The performance gains are initially monotonic but they tend to saturate after integrating the first few observations. It is also shown that the fusion based on rank order statistic, i.e. the median, is robust to outliers.
Audio-visual automatic speech recognition: An overview
- Issues in Visual and Audio-visual Speech Processing
, 2004
"... We have made significant progress in automatic speech recognition (ASR) for well-defined applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. However, ASR performance has yet to reach the level required for speech to become a truly per ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
We have made significant progress in automatic speech recognition (ASR) for well-defined applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. However, ASR performance has yet to reach the level required for speech to become a truly pervasive user interface. Indeed, even in “clean ” acoustic environments, and for a variety of tasks, state of the art ASR system
Multi-Modal Identity Verification Using Expert Fusion
- Information Fusion
, 2000
"... The contribution of this paper is to compare paradigms coming from the classes of parametric, and non-parametric techniques to solve the decision fusion problem encountered in the design of a multi-modal biometrical identity verification system. The multi-modal identity verification system under con ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
The contribution of this paper is to compare paradigms coming from the classes of parametric, and non-parametric techniques to solve the decision fusion problem encountered in the design of a multi-modal biometrical identity verification system. The multi-modal identity verification system under consideration is built of d modalities in parallel, each one delivering as output a scalar number, called score, stating how well the claimed identity is verified. A decision fusion module receiving as input the d scores has to take a binary decision: accept or reject the claimed identity. We have solved this fusion problem using parametric and non-parametric classifiers. The performances of all these fusion modules have been evaluated and compared with other approaches on a multi-modal database, containing both vocal and visual biometric modalities. Keywords: Multi-modal identity verification, biometrics, decision fusion. 1 Introduction The automatic verification 1 of a person is more and...
Automatic Person Verification Using Speech and Face Information
, 2003
"... Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one pre ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the person's speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems
Information Fusion and Person Verification Using Speech & Face Information
, 2002
"... This report provides an overview of important concepts in the field of information fusion, followed by a review of literature pertaining to audio-visual person identification & verification. Several recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept o ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
This report provides an overview of important concepts in the field of information fusion, followed by a review of literature pertaining to audio-visual person identification & verification. Several recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on audio and visual information, are evaluated in clean and noisy conditions on a common database using a text-independent setup. It is shown that in clean conditions all the non-adaptive approaches provide similar performance; in noisy conditions they exhibit deterioration in their performance. It is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision surface is fixed but constructed to take into account the effects of noisy conditions, providing a good trade-off between performance in clean and noisy conditions. Keywords: multi-modal, fusion, person recognition, person verification, face recognition, face verification, speaker recognition, speaker verification, noise resistance, adaptability.
Fusion of Audio and Video Information for Multi Modal Person Authentication
- Pattern Recognition Letters
, 1997
"... This paper addresses the issue of how these scores can be represented and conciliated to a single opinion on the authenticity level of the user's identity claim ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
This paper addresses the issue of how these scores can be represented and conciliated to a single opinion on the authenticity level of the user's identity claim
A Contribution to Multi-Modal Identity Verification Using Decision Fusion
- Department of
, 1999
"... The contribution of this paper is to compare paradigms coming from the classes of parametric, and non-parametric techniques to solve the decision fusion problem encountered in the design of a multi-modal biometrical identity verification system. The multi-modal identity verification system under con ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
The contribution of this paper is to compare paradigms coming from the classes of parametric, and non-parametric techniques to solve the decision fusion problem encountered in the design of a multi-modal biometrical identity verification system. The multi-modal identity verification system under consideration is built of d modalities in parallel, each one delivering as output a scalar number, called score, stating how well the claimed identity is verified. A decision fusion module receiving as input the d scores has to take a binary decision: accept or reject the claimed identity. We have solved this fusion problem using parametric and non-parametric classifiers. The performances of all these fusion modules have been evaluated and compared with other approaches on a multi-modal database, containing both vocal and visual biometric modalities. Keywords: Multi-modal identity verification, biometrics, decision fusion. 1 Introduction The automatic verification 1 of a person is more and...
A Multi-Level Data Fusion Approach for Gradually Upgrading the Performances of Identity Verification Systems
, 1999
"... The aim of this paper is to propose a strategy that uses data fusion at three different levels to gradually improve the performance of an identity verification system. In a first step temporal data fusion can be used to combine multiple instances of a single (mono-modal) expert to reduce its measure ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
The aim of this paper is to propose a strategy that uses data fusion at three different levels to gradually improve the performance of an identity verification system. In a first step temporal data fusion can be used to combine multiple instances of a single (mono-modal) expert to reduce its measurement variance. If system performance after this first step is not good enough to satisfy the end-user's needs, one can improve it by fusing in a second step results of multiple experts working on the same (biometric) modality. For this approach to work, it is supposed that the respective classification errors of the different experts are (at least partially) de-correlated. Finally, if the verification system's performance after this second step is still not good enough, one will be forced to move on to the third step in which performance can be improved by using multiple experts working on different (biometric) modalities. To be useful however, these experts have to be chosen in such a way ...
Word-Dependent Acoustic-Labial Weights In HMM-Based Speech Recognition
- Proc. AVSP, Rhodes
, 1997
"... This paper describes a novel approach for weighting the contribution of the acoustic and visual sources of information in a bimodal connected speech recognition system. We consider that a different acousticlabial weight is attached to each recognition unit. The values of the weighting vector are opt ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper describes a novel approach for weighting the contribution of the acoustic and visual sources of information in a bimodal connected speech recognition system. We consider that a different acousticlabial weight is attached to each recognition unit. The values of the weighting vector are optimised in order to minimise error rate on a learning set. Experiments are performed on a two-speakers audio-visual database, composed of connected letters, with two different acoustic-labial speech recognition systems. For both speakers and both systems, the weights optimisation allows us to increase the recognition rate of our bimodal system. 1 INTRODUCTION In normal conditions, the acoustic signal contains more information about the oral message or the speaker's identity than the visual information about the lips. Nevertheless, these two sources of information are not redundant : taking labial features into account may lead to an improvement of speech processing systems [8, 7, 3, 15, 9]....

