Results 1 - 10
of
182
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 117 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
A Review of Algorithms for Audio Fingerprinting
- In Workshop on Multimedia Signal Processing
, 2002
"... An audio fingerprint is a content-based compact signature that summarizes an audio recording. Audio Fingerprinting technologies have recently attracted attention since they allow the monitoring of audio independently of its format and without the need of meta-data or watermark embedding. The differe ..."
Abstract
-
Cited by 84 (2 self)
- Add to MetaCart
(Show Context)
An audio fingerprint is a content-based compact signature that summarizes an audio recording. Audio Fingerprinting technologies have recently attracted attention since they allow the monitoring of audio independently of its format and without the need of meta-data or watermark embedding. The different approaches to fingerprinting are usually described with different rationales and terminology depending on the background: Pattern matching, Multimedia (Music) Information Retrieval or Cryptography (Robust Hashing). In this paper, we review different techniques mapping functional parts to blocks of a unified framework.
A review of audio fingerprinting
- Journal of VLSI Signal Processing Systems
, 2005
"... Abstract. An audio fingerprint is a compact content-based signature that summarizes an audio recording. Audio Fingerprinting technologies have attracted attention since they allow the identification of audio independently of its format and without the need of meta-data or watermark embedding. Other ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
(Show Context)
Abstract. An audio fingerprint is a compact content-based signature that summarizes an audio recording. Audio Fingerprinting technologies have attracted attention since they allow the identification of audio independently of its format and without the need of meta-data or watermark embedding. Other uses of fingerprinting include: integrity verification, watermark support and content-based audio retrieval. The different approaches to fingerprinting have been described with different rationales and terminology: Pattern matching, Multimedia (Music) Information Retrieval or Cryptography (Robust Hashing). In this paper, we review different techniques describing its functional blocks as parts of a common, unified framework.
User Authentication via Adapted Statistical Models of Face Images
, 2006
"... It has been previously demonstrated that systems based on local features and relatively complex statistical models, namely, one-dimensional (1-D) hidden Markov models (HMMs) and pseudo-two-dimensional (2-D) HMMs, are suitable for face recognition. Recently, a simpler statistical model, namely, th ..."
Abstract
-
Cited by 61 (10 self)
- Add to MetaCart
It has been previously demonstrated that systems based on local features and relatively complex statistical models, namely, one-dimensional (1-D) hidden Markov models (HMMs) and pseudo-two-dimensional (2-D) HMMs, are suitable for face recognition. Recently, a simpler statistical model, namely, the Gaussian mixture model (GMM), was also shown to perform well. In much of the literature devoted to these models, the experiments were performed with controlled images (manual face localization, controlled lighting, background, pose, etc). However, a practical recognition system has to be robust to more challenging conditions. In this article we evaluate, on the relatively difficult BANCA database, the performance, robustness and complexity of GMM and HMM-based approaches, using both manual and automatic face localization. We extend the GMM approach through the use of local features with embedded positional information, increasing performance without sacrificing its low complexity. Furthermore, we show that the traditionally used maximum likelihood (ML) training approach has problems estimating robust model parameters when there is only a few training images available. Considerably more precise models can be obtained through the use of Maximum a posteriori probability (MAP) training. We also show that face recognition techniques which obtain good performance on manually located faces do not necessarily obtain good performance on automatically located faces, indicating that recognition techniques must be designed from the ground up to handle imperfect localization. Finally, we show that while the pseudo-2-D HMM approach has the best overall performance, authentication time on current hardware makes it impractical. The best tradeoff in terms of authentication ...
Heterogeneous acoustic measurements and multiple classifiers for speech recognition,”
, 1998
"... ..."
Identity Verification Using Speech And Face Information
- DIGITAL SIGNAL PROCESSING
, 2004
"... This article first provides an overview of important concepts in the field of information fusion, followed by a review of milestones in audio-visual person identification and verification. Several recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
This article first provides an overview of important concepts in the field of information fusion, followed by a review of milestones in audio-visual person identification and verification. Several recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on speech and face information, are then evaluated in clean and noisy audio conditions on a common database; it is shown that in clean conditions most of the non-adaptive approaches provide similar performance and in noisy conditions most exhibit a severe deterioration in performance; it is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision boundary is fixed but constructed to take into account how the distributions of opinions are likely to change due to noisy conditions; compared to a previously proposed adaptive approach, the proposed classifiers do not make a direct assumption about the type of noise that causes the mismatch between training and testing conditions.
Spectral subband centroid features for speech recognition
- in Proc. IEEE ICASSP
, 1998
"... Cepstral coefficients derived either through linear prediction (LP) analysis or from filter bank are perhaps the most commonly used features in currently available speech recognition systems. In this paper, we propose spectral subband centroids as new features and use them as supplement to cepstral ..."
Abstract
-
Cited by 44 (6 self)
- Add to MetaCart
(Show Context)
Cepstral coefficients derived either through linear prediction (LP) analysis or from filter bank are perhaps the most commonly used features in currently available speech recognition systems. In this paper, we propose spectral subband centroids as new features and use them as supplement to cepstral features for speech recognition. We show that these features have properties similar to formant frequencies and they are quite robust to noise. Recognition results are reported in the paper justifying the usefulness of these features as supplementary features. 1.
Identifying Distinctive Subsequences in Multivariate Time Series by Clustering
- PROC. ACM SIGKDD
, 1999
"... Most time series comparison algorithms attempt to discover what the members of a set of time series have in common. We investigate a different problem, determining what distinguishes time series in that set from other time series obtained from the same source. In both cases the goal is to identif ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
(Show Context)
Most time series comparison algorithms attempt to discover what the members of a set of time series have in common. We investigate a different problem, determining what distinguishes time series in that set from other time series obtained from the same source. In both cases the goal is to identify shared patterns, though in the latter case those patterns must be distinctiveaswell. An efficient incremental algorithm for identifying distinctive subsequences in multivariate, real-valued time series is described and evaluated with data from two very different sources: the response of a set of bandpass filters to human speech and the sensors of a mobile robot.
Information Fusion and Person Verification Using Speech & Face Information
, 2002
"... This report provides an overview of important concepts in the field of information fusion, followed by a review of literature pertaining to audio-visual person identification & verification. Several recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to acce ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
This report provides an overview of important concepts in the field of information fusion, followed by a review of literature pertaining to audio-visual person identification & verification. Several recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on audio and visual information, are evaluated in clean and noisy conditions on a common database using a text-independent setup. It is shown that in clean conditions all the non-adaptive approaches provide similar performance; in noisy conditions they exhibit deterioration in their performance. It is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision surface is fixed but constructed to take into account the effects of noisy conditions, providing a good trade-off between performance in clean and noisy conditions. NOTE: This report has been superseded by [48].
Automatic Person Verification Using Speech and Face Information
, 2003
"... Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one pre ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the person's speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems