• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Discriminative classifiers with adaptive kernels for noise robust speech recognition,” CSL (2010)

by M Gales, F Flego
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 12
Next 10 →

Covariance Modelling for Noise-Robust Speech Recognition

by R. C. Van Dalen, M. J. F. Gales
"... Model compensation is a standard way of improving speech recognisers’ robustness to noise. Most model compensation techniques produce diagonal covariances. However, this fails to handle any changes in the feature correlations due to the noise. This paper presents a scheme that allows full-covariance ..."
Abstract - Cited by 5 (5 self) - Add to MetaCart
Model compensation is a standard way of improving speech recognisers’ robustness to noise. Most model compensation techniques produce diagonal covariances. However, this fails to handle any changes in the feature correlations due to the noise. This paper presents a scheme that allows full-covariance matrices to be estimated. One problem is that full covariance matrix estimation will be more sensitive approximations, those for the dynamic parameters are known to crude. In this paper a linear transformation of a window of consecutive frames is used as the basis for dynamic parameter compensation. A second problem is that the resulting full covariance matrices slow down decoding. This is addressed by using predictive linear transforms that decorrelate the feature space, so that the decoder can then use diagonal covariance matrices. On a noise-corrupted Resource Management task, the proposed scheme outperformed the standard VTS compensation scheme.

Structured log linear models for noise robust speech recognition

by S. -x. (austin Zhang, M. J. F Gales - Signal Processing Letters, IEEE , 2010
"... [ The use of discriminative models for structured classification tasks, such as automatic speech recognition is becoming increasingly popular. The major contribution of this work is we proposed a large margin structured log-linear model for noise robust continuous ASR. 1 An important aspect of log-l ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
[ The use of discriminative models for structured classification tasks, such as automatic speech recognition is becoming increasingly popular. The major contribution of this work is we proposed a large margin structured log-linear model for noise robust continuous ASR. 1 An important aspect of log-linear models is the form of the features. The features used in our structured log linear model are derived from generative kernels. This provides an elegant way of combining generative and discriminative models to handle time-varying data. Additionally, since the features are based on the generative models, model-based compensation can be easily performed for noise robustness. Third, the designed joint feature space can be decomposed at the arc level. This allows efficient decoding and training with lattices, which is important for any larger vocabulary extensions. Previous work in this area is extended in two important directions. First, instead of using CML training which is commonly used for discriminative models, this paper describes efficient large margin training for sentence-level log linear models based on lattices. Depending on the nature of the joint feature-space and labels, we have proved that this form of model is closely related to structured SVMs and Multiclass SVMs. Second, efficient lattice-based classification of continuous data is also performed incorporating a joint feature space. This novel model combines generative kernels, discriminative models, efficient lattice-based large margin training and modelbased noise compensation. It is evaluated on a noise corrupted continuous digit task: AURORA 2.0. Results on the AURORA 2 demonstrate that modelling the structure information yields significant improvements.]

Structured Support Vector Machines for Noise Robust Continuous Speech Recognition

by Shi-xiong Zhang, M. J. F. Gales
"... The use of discriminative models is an interesting alternative to generative models for speech recognition. This paper examines one form of these models, structured support vector machines (SVMs), for noise robust speech recognition. One important aspect of structured SVMs is the form of the joint f ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
The use of discriminative models is an interesting alternative to generative models for speech recognition. This paper examines one form of these models, structured support vector machines (SVMs), for noise robust speech recognition. One important aspect of structured SVMs is the form of the joint feature space. In this work features based on generative models are used, which allows model-based compensation schemes to be applied to yield robust joint features. However, these features require the segmentation of frames into words, or subwords, to be specified. In previous work this segmentation was obtained using generative models. Here the segmentations are refined using the parameters of the structured SVM. A Viterbilike scheme for obtaining “optimal ” segmentations, and modifications to the training algorithm to allow them to be efficiently used, are described. The performance of the approach is evaluated on a noise corrupted continuous digit task: AURORA 2. Index Terms: speech recognition, structural SVMs, optimal alignment, large margin, log linear model

Improving Reverberant VTS for Hands-free Robust Speech Recognition

by Y. -q. Wang, M. J. F. Gales
"... Abstract—Model-based approaches to handling additive background noise and channel distortion, such as Vector Taylor Series (VTS), have been intensively studied and extended in a number of ways. In previous work, VTS has been extended to handle both reverberant and background noise, yielding the Reve ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract—Model-based approaches to handling additive background noise and channel distortion, such as Vector Taylor Series (VTS), have been intensively studied and extended in a number of ways. In previous work, VTS has been extended to handle both reverberant and background noise, yielding the Reverberant VTS (RVTS) scheme. In this work, rather than assuming the observation vector is generated by the reverberation of a sequence of background noise corrupted speech vectors, as in RVTS, the observation vector is modelled as a superposition of the background noise and the reverberation of clean speech. This yields a new compensation scheme RVTS Joint (RVTSJ), which allows an easy formulation for joint estimation of both additive and reverberation noise parameters. These two compensation schemes were evaluated and compared on a simulated reverberant noise corrupted AURORA4 task. Both yielded large gains over VTS baseline system, with RVTSJ outperforming the previous RVTS scheme. I.

Model-Based Approaches to Handling Uncertainty

by M. J. F. Gales
"... Abstract A powerful approach for handling uncertainty in observations is to modify the statistical model of the data to appropriately reflect this uncertainty. For the task of noise robust speech recognition, this requires modifying an underlying ”clean” acoustic model to be representative of speech ..."
Abstract - Add to MetaCart
Abstract A powerful approach for handling uncertainty in observations is to modify the statistical model of the data to appropriately reflect this uncertainty. For the task of noise robust speech recognition, this requires modifying an underlying ”clean” acoustic model to be representative of speech in a particular target acoustic environment. This chapter describes the underlying concepts of model-based noise compensation for robust speech recognition and how it can be applied to standard systems. The chapter will then consider important practical issues. These include: i) acoustic environment noise parameter estimation; ii) efficient acoustic model compensation and likelihood calculation; iii) and adaptive training to handle multi-style training data. The chapter will conclude by discussing the limitations of the current approaches and research options to address them. 1

noise-robust speech recognition

by R. C. Van Dalen, M. J. F. Gales , 2010
"... Model compensation techniques for noise-robust speech recognition approximate the corrupted speech distribution. This work introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it tr ..."
Abstract - Add to MetaCart
Model compensation techniques for noise-robust speech recognition approximate the corrupted speech distribution. This work introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it transforms the integral in the likelihood expression, and then applies sequential importance resampling. Though it is too slow to compensate a speech recognition system, it enables a more fine-grained assessment of compensation techniques, based on the kl divergences to the ideal compensation for individual components. The kl divergence appears to predict the word error rate well. This technique also makes it possible to evaluate the impact of approximations that compensation schemes make. For example, this work examines the influence of the assumption that the corrupted speech distribution is Gaussian and diagonalising that Gaussian’s covariance. It also assesses the impact of a common approximation to the mismatch function for vts compensation, namely setting the

Department of Engineering 1

by R. C. Van Dalen, A. Ragni, M. J. F. Gales , 2012
"... with continuous rational kernels using the expectation semiring ..."
Abstract - Add to MetaCart
with continuous rational kernels using the expectation semiring

Extending Noise Robust Structured Support Vector Machines to Larger Vocabulary Tasks

by Shi-xiong Zhang, M. J. F. Gales
"... Abstract—This paper describes a structured SVM framework suitable for noise-robust medium/large vocabulary speech recognition. Several theoretical and practical extensions to previous work on small vocabulary tasks are detailed. The joint feature space based on word models is extended to allow conte ..."
Abstract - Add to MetaCart
Abstract—This paper describes a structured SVM framework suitable for noise-robust medium/large vocabulary speech recognition. Several theoretical and practical extensions to previous work on small vocabulary tasks are detailed. The joint feature space based on word models is extended to allow contextdependent triphone models to be used. By interpreting the structured SVM as a large margin log-linear model, illustrates that there is an implicit assumption that the prior of the discriminative parameter is a zero mean Gaussian. However, depending on the definition of likelihood feature space, a nonzero prior may be more appropriate. A general Gaussian prior is incorporated into the large margin training criterion in a form that allows the cutting plan algorithm to be directly applied. To further speed up the training process, 1-slack algorithm, caching competing hypothesis and parallelization strategies are also proposed. The performance of structured SVMs is evaluated on noise corrupted medium vocabulary speech recognition task: AURORA 4. I.

1 Structured Log Linear Models for Noise Robust Speech Recognition

by Shi-xiong Zhang, Anton Ragni, Student Member, Mark Gales, Senior Member
"... Abstract—The use of discriminative models for structured classification tasks, such as speech recognition is becoming increasingly popular. This paper examines the use of structured loglinear models for noise robust speech recognition. An important aspect of log-linear models is the form of the feat ..."
Abstract - Add to MetaCart
Abstract—The use of discriminative models for structured classification tasks, such as speech recognition is becoming increasingly popular. This paper examines the use of structured loglinear models for noise robust speech recognition. An important aspect of log-linear models is the form of the features. By using generative models to derive the features, state-of-the-art model-based compensation schemes can be used to make the system robust to noise. Previous work in this area is extended in two important directions. First, a large margin training of sentence-level log linear models is proposed for ASR. This form of model is shown to be similar to the recently proposed structured SVM. Second, based on the designed joint features, efficient lattice-based training and decoding are performed. This novel model combines generative kernels, discriminative models, efficient lattice-based large margin training and model-based noise compensation. It is evaluated on a noise corrupted continuous digit task: AURORA 2.0.

Importance Sampling to Compute Likelihoods of Noise-Corrupted Speech ✩

by R. C. Van Dalen, M. J. F. Gales
"... One way of making speech recognisers more robust to noise is model compensation. Rather than enhancing the incoming observations, model compensation techniques modify a recogniser’s state-conditional distributions so they model the speech in the target environment. Because the interaction between sp ..."
Abstract - Add to MetaCart
One way of making speech recognisers more robust to noise is model compensation. Rather than enhancing the incoming observations, model compensation techniques modify a recogniser’s state-conditional distributions so they model the speech in the target environment. Because the interaction between speech and noise is non-linear, even for Gaussian speech and noise the corrupted speech distribution has no closed form. Thus, model compensation methods approximate it with a parametric distribution, such as a Gaussian or a mixture of Gaussians. The impact of this approximation has never been quantified. This paper therefore introduces a non-parametric method to compute the likelihood of a corrupted speech observation. It uses sampling and, given speech and noise distributions and a mismatch function, is exact in the limit. It therefore gives a theoretical bound for model compensation. Though computing the likelihood is computationally expensive, the novel method enables a performance comparison based on the criterion that model compensation methods aim to minimise: the KL divergence to the ideal compensation. It gives the point where the Kullback-Leibler (KL) divergence is zero. This paper examines the performance of various compensation methods, such as vector Taylor series (VTS) and data-driven parallel model combination (DPMC). It shows that more accurate modelling than Gaussian-for-Gaussian compensation improves the performance of speech recognition.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University