Maximum Likelihood Linear Transformations for HMM-Based Speech Recognition (1998)
| Venue: | Computer Speech and Language |
| Citations: | 275 - 44 self |
BibTeX
@ARTICLE{Gales98maximumlikelihood,
author = {M.J.F. Gales},
title = {Maximum Likelihood Linear Transformations for HMM-Based Speech Recognition},
journal = {Computer Speech and Language},
year = {1998},
volume = {12},
pages = {75--98}
}
Years of Citing Articles
OpenURL
Abstract
This paper examines the application of linear transformations for speaker and environmental adaptation in an HMM-based speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias, strict linear feature-space transformations are inappropriate in this case. Hence, only model-based linear transforms are considered. The paper compares the two possible forms of model-based transforms: (i) unconstrained, where any combination of mean and variance transform may be used, and (ii) constrained, which requires the variance transform to have the same form as the mean transform (sometimes referred to as feature-space transforms). Re-estimation formulae for all appropriate cases of transform are given. This includes a new and efficient "full" variance transform and the extension of the constrained model-space transform from the simple diagonal case to the full or block-diagonal case. The constrained and unconstrained transforms are evaluated in terms of computational cost, recognition time efficiency, and use for speaker adaptive training. The recognition performance of the two model-space transforms on a large vocabulary speech recognition task using incremental adaptation is investigated. In addition, initial experiments using the constrained model-space transform for speaker adaptive training are detailed. 1 The author is now at the IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA 1







