A BINAURAL MODEL AS A FRONT-END FOR ISOLATED WORD RECOGNITION
Abstract:
Small vocabulary isolated word speech recognition can be implemented on relative small hardware. Although the recognition problem is more or less solved in noise-free situations, the general application is hindered because of the dramatic decrease of performance in noisy environments, especially for hands-free applications. In this paper a binaural front-end for speech recognition is presented. This binaural model, whichwas originally developed at Ruhr-Universityof Bochum in Germany, allows for an effective reduction of interfering noises of any kind. Besides stationary noises also concurrent speech signals can be suppressed. The original model was designed as a precise computer model of the human binaural auditory system and can explain a varietyof psycho-acoustical phenomenon. Besides those abilities the model offers sharp directional selectivity which is superior to those obtained with directional microphones. We simplified this sophisticated model by adapting it to the specific task and use the peak position and the peak level of the binaural activity pattern for each frequency band as a parameter for pattern matching. The performance was evaluated in the form of recognition rates for a variety of difference noisy environments. The results show that the binaural front-end leads to a significant improvement in recognition rates corresponding to an enhancement of over 20dB in SNR in most cases. 1.
Citations
| 135 | Spatial Hearing: The Psychophysics of Human Sound Localization, revised edition – Blauert - 1997 |
| 28 | Modeling human sound-source localization and the cocktail-party-effect – Bodden - 1993 |
| 18 | Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals – Lindemann - 1986 |
| 11 | Combined evaluation of interaural time and intensity differences: psychoacoustic results and computer modeling – Gaik - 1993 |
| 1 | A configuration of remote control system using speech within a priori known noise – Usagawa, Morita, et al. - 1992 |

