On binary and ratio time-frequency masks for robust speech recognition (2004)
| Venue: | In ICSLP, Jeju, Korea |
| Citations: | 11 - 4 self |
BibTeX
@INPROCEEDINGS{Srinivasan04onbinary,
author = {Soundararajan Srinivasan},
title = {On binary and ratio time-frequency masks for robust speech recognition},
booktitle = {In ICSLP, Jeju, Korea},
year = {2004},
pages = {2541--2544}
}
Years of Citing Articles
OpenURL
Abstract
A time-varying Wiener filter extracts the speech signal from a noisy mixture using the a priori signal-to-noise ratio in a local time-frequency unit. We estimate this ratio using a binaural processor and derive a ratio time-frequency mask. This mask is used to extract the speech signal, which is then fed to a conventional speech recognizer operating in the cepstral domain. We compare the performance of this system with a missing data recognizer that operates in the spectral domain using the timefrequency units dominated by speech. For use by the missing data recognizer, the same processor is used to estimate an ideal time-frequency binary mask, which selects the speech signal if it is stronger than the interference in a local time-frequency unit. We find that the performance of the missing data recognizer is better on a small vocabulary recognition task but the performance of the conventional recognizer is substantially better when the vocabulary size is larger. 1.







