Scalable Distributed Speech Recognition Using Multi-Frame GMM-Based Block Quantization
| Citations: | 5 - 2 self |
BibTeX
@MISC{Paliwal_scalabledistributed,
author = {Kuldip K. Paliwal and Stephen So},
title = {Scalable Distributed Speech Recognition Using Multi-Frame GMM-Based Block Quantization},
year = {}
}
OpenURL
Abstract
In this paper, we propose the use of the multi-frame Gaussian mixture model-based block quantizer for the coding of Mel frequencywarped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. This coding scheme exploits intraframe correlation via the Karhunen-Lo eve transform (KLT) and interframe correlation via the joint processing of adjacent frames together with the computational simplicity of scalar quantization. The proposed coder is bit-rate scalable, which means that the bitrate can be adjusted without the need for re-training of the quantizers. Static parameters such as the probability density function (PDF) model and KLT orthogonal matrices are stored at the encoder and decoder and bit allocations are calculated `on-the-fly' without intensive processing. This coding scheme is evaluated in this paper on the Aurora-2 database in a DSR framework. It is shown that this coding scheme achieves high recognition performance at lower bitrates, with a word error rate (WER) of 2.5% at 800 bps, which is less than 1% degradation from the baseline word recognition accuracy, and graceful degradation down to a WER of 7% at 300 bps.







