Results 1 - 10
of
17
P.C.: Minimum phone error and I-smoothing for improved discriminative training
- In: Proc. ICASSP
, 2002
"... In this paper we introduce the Minimum Phone Error (MPE) and Minimum Word Error (MWE) criteria for the discriminative train-ing of HMM systems. The MPE/MWE criteria are smoothed ap-proximations to the phone or word error rate respectively. We also discuss I-smoothing which is a novel technique for s ..."
Abstract
-
Cited by 240 (9 self)
- Add to MetaCart
(Show Context)
In this paper we introduce the Minimum Phone Error (MPE) and Minimum Word Error (MWE) criteria for the discriminative train-ing of HMM systems. The MPE/MWE criteria are smoothed ap-proximations to the phone or word error rate respectively. We also discuss I-smoothing which is a novel technique for smoothing dis-criminative training criteria using statistics for maximum likeli-hood estimation (MLE). Experiments have been performed on the Switchboard/Call Home corpora of telephone conversations with up to 265 hours of training data. It is shown that for the maximum mutual information estimation (MMIE) criterion, I-smoothing re-duces the word error rate (WER) by 0.4 % absolute over the MMIE baseline. The combination of MPE and I-smoothing gives an im-provement of 1 % over MMIE and a total reduction in WER of 4.8 % absolute over the original MLE system. 1.
Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition
, 2001
"... The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The u ..."
Abstract
-
Cited by 60 (8 self)
- Add to MetaCart
(Show Context)
The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The unified criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classification error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree search-based restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for efficient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate significant differences between EB and GD optimization. For acoustic models of low complexity, MCE training gave significantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant co...
Comparison Of Optimization Methods For Discriminative Training Criteria
- IN PROC. EUROSPEECH’97
, 1997
"... In this work we compare two parameter optimization techniques for discriminative training using the MMI criterion: the extended Baum-Welch (EBW) algorithm and the generalized probabilistic descent (GPD) method. Using Gaussian emission densities we found special expressions for the step sizes in GPD, ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
(Show Context)
In this work we compare two parameter optimization techniques for discriminative training using the MMI criterion: the extended Baum-Welch (EBW) algorithm and the generalized probabilistic descent (GPD) method. Using Gaussian emission densities we found special expressions for the step sizes in GPD, leading to reestimation formula very similar to those derived for the EBW algorithm. Results were produced for both the TI digitstring and the SieTill corpus for continuously spoken American English and German digitstrings. The results for both techniques do not show significant differences. This experimental results support the strong link between EBW and GPD as expected from the analytic comparison.
Discriminative Training on language model
"... a lot of problems, including speech recognition, handwriting, Chinese pinyin-input etc. In recognition, statistical language model, such as trigram, is used to provide adequate information to predict the probabilities of hypothesized word sequences. The traditional method relying on distribution est ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
a lot of problems, including speech recognition, handwriting, Chinese pinyin-input etc. In recognition, statistical language model, such as trigram, is used to provide adequate information to predict the probabilities of hypothesized word sequences. The traditional method relying on distribution estimation are sub-optimal when the assumed distribution form is not the true one, and that "optimality" in distribution estimation does not automatically translate into "optimality" in classifier design. This paper proposed a discriminative training method to minimize the error rate of recognizer rather than estimate the distribution of training data. Furthermore, lexicon is also optimized to minimize the error rate of the decoder through discriminative training. Compared to the traditional LM building method, our systems gets approximately 5%-25% recognition error reduction with discriminative training on language model building.
On Deformable Models for Visual Pattern Recognition
, 2002
"... This paper reviews model-based methods for non-rigid shape recognition. These methods model, match and classif non-rigid shapes, which aregefIxq#x problematic for conventational algentati using rigg models. Issues including ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper reviews model-based methods for non-rigid shape recognition. These methods model, match and classif non-rigid shapes, which aregefIxq#x problematic for conventational algentati using rigg models. Issues including
A Combination of Discriminative and Maximum Likelihood Techniques for Noise Robust Speech Recognition
- InProceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing
, 1998
"... In this paper, we study how discriminative and Maximum Likelihood (ML) techniques should be combined in order to maximize the recognition accuracy of a speaker-independent Automatic Speech Recognition (ASR) system that includes speaker adaptation. We compare two training approaches for speaker-indep ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we study how discriminative and Maximum Likelihood (ML) techniques should be combined in order to maximize the recognition accuracy of a speaker-independent Automatic Speech Recognition (ASR) system that includes speaker adaptation. We compare two training approaches for speaker-independent case and examine how well they per-form together with four different speaker adaptation schemes. In a noise robust connected digit recognition task we show that the Minimum Classification Error (MCE) training ap-proach for speaker-independent modelling together with the Bayesian speaker adaptation scheme provide the highest clas-sification accuracy over the whole lifespan of an ASR system. With the MCE training we are capable of reducing the recog-nition errors by 30 % over the ML approach in the speaker-independent case. With the Bayesian speaker adaptation scheme we can further reduce the error rates by 62 % using only as few as five adaptation utterances. 1.
Minimum Classification Error Training in Example Based Speech and Pattern Recognition Using Sparse Weight Matrices
, 2009
"... The Minimum Classification Error (MCE) criterion is a wellknown criterion in pattern classification systems. The aim of MCE training is to minimize the resulting classification error when trying to classify a new data set. Usually, these classification systems use some form of statistical model to d ..."
Abstract
- Add to MetaCart
The Minimum Classification Error (MCE) criterion is a wellknown criterion in pattern classification systems. The aim of MCE training is to minimize the resulting classification error when trying to classify a new data set. Usually, these classification systems use some form of statistical model to describe the data. These systems usually do not work very well when this underlying model is incorrect. Speech recognition systems traditionally use Hidden Markov Models (HMM) with Gaussian (or Gaussian mixture) probability density functions as their basic model. It is well known that these models make some assumptions that are not correct. In example based approaches, these statistical models are absent and are replaced by the pure data. The absence of statistical models has created the need for parameters to model the data space accurately. For this work, we use the MCE criterion to create a system that is able to work together with this example based approach. Moreover, we extend the locally scaled distance measure with sparse, block diagonal weight matrices resulting in a better model for the data space and avoiding the computational load caused by using full matrices. We illustrate the approach with some example experiments on databases from pattern recognition and with speech recognition.
A Survey of Discriminative and Connectionist Methods for Speech Processing
, 2002
"... Discriminative speech processing techniques attempt to compute the maximum a posterior probability of some speech event, such as a particular phoneme being spoken, given the observed data. Non-discriminative techniques compute the likelihood of the observed data assuming an event. Non-discriminative ..."
Abstract
- Add to MetaCart
(Show Context)
Discriminative speech processing techniques attempt to compute the maximum a posterior probability of some speech event, such as a particular phoneme being spoken, given the observed data. Non-discriminative techniques compute the likelihood of the observed data assuming an event. Non-discriminative methods such as simple HMMs (hidden Markov models) achieved success despite their lack of discriminative modelling. This survey will look at enhancements to the HMM model which have improved their discrimination ability and hence their overall performance. This survey also reviews alternative discriminative methods, namely connectionist methods such as ANNs (arti cial neural networks). We will also draw comparisons between discriminative HMMs and connectionist models, showing that connectionist models can be viewed as a generalisation of discriminative HMMs. 1
finders
, 2004
"... An empirical analysis of training protocols for probabilistic gene ..."
(Show Context)
Table of Contents Table of Contents v List of Tables viii
"... Als que més estimo. Als que més m’estimen. iv ..."
(Show Context)