@MISC{Lef03non-parametricprobability, author = {Fabrice Lef}, title = {Non-parametric probability estimation for HMM-based automatic speech recognition COMPUTER SPEECH AND LANGUAGE}, year = {2003} }
Share
OpenURL
Abstract
Abstract During the last decade, the most significant advances in the field of continuous speech recognition (CSR) have arisen from the use of hidden Markov models (HMM) for acoustic modeling. These models address one of the major issues for CSR: simultaneous modeling of temporal and frequency distortions in the speech signal. In the HMM, the temporal dimension is managed through an oriented states graph, each state accounting for the local frequency distortions through a probability density function. In this study, improvement of the HMM performance is expected from the introduction of a very effective non-parametric probability density function estimate: the k-nearest neighbors (k-nn) estimate. First, experiments on a short-term speech spectrum identification task are performed to compare the knn estimate and the widespread estimate based on mixtures of Gaussian functions. Then adaptations implied by the integration of the k-nn estimate in an HMM-based recognition system are developed. An optimal training protocol is obtained based on the introduction of the membership coefficients in the HMM parameters. The membership coefficients measure the degree of association between a reference acoustic vector and a HMM state. The training procedure uses the expectation-maximization (EM) algorithm applied to the membership coefficient estimation. Its convergence is shown according to the maximum likelihood criterion. This study leads to the development of a baseline k-nn/HMM recognition system which is evaluated on the TIMIT speech database. Further improvements of the k-nn/HMM system are finally sought through the introduction of a temporal information into the representation space (delta coefficients) and the adaptation of the references (mainly, gender modeling and contextual modeling).