When speech is corrupted by noise, the performance of automatic speech recognition systems degrades significantly. There have been many algorithms proposed that compensate for the negative effects of noise in speech and greatly improve recognition accuracy. However, these methods assume that the corrupting noise is stationary. If the noise is non-stationary, these methods fail. A promising new group of compensa-tion algorithms have recently emerged which do not have this restriction on the noise characteristics. These methods operate on the notion that noise affects different frequency bands of speech differently depending on the relative energies of the speech and the noise at each time-frequency location. In a spectrographic display of noisy speech, regions of low SNR will be more corrupt than regions of high SNR. Low SNR regions of a spectrogram are considered to be “missing ” or “unreliable ” and are removed from the spectro-gram. Noise compensation is carried out by either estimating the missing regions from the remaining regions in some manner prior to recognition, or by performing recognition directly on incomplete spectro-grams. These techniques clearly require a "spectrographic mask " which accurately labels the reliable and unreliable regions of a spectrogram. Currently, there are no good techniques for accurately estimating such a mask. The methods that have been used so far rely on the assumptions about the interfering noise such as
|
4345
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
2961
|
Pattern Classification and Scene Analysis
– Duba, Hart
- 1973
|
|
880
|
and B-H Juang, Fundamentals of Speech Recognition
– Rabiner
- 1993
|
|
342
|
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
– Davis, Mermelstein
- 1980
|
|
269
|
Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains
– Gauvain, Lee
- 1994
|
|
250
|
Auditory scene analysis
– Bregman
- 1990
|
|
192
|
Suppression of acoustic noise in speech using spectral subtraction
– Boll
- 1979
|
|
149
|
An Introduction to the Psychology of Hearing
– Moore
- 1997
|
|
136
|
Modern Spectral Estimation: Theory and Application
– Kay
- 1988
|
|
101
|
Environmental robustness in automatic speech recognition
– Acero, Stern
|
|
93
|
Prediction–driven computational auditory scene analysis
– Ellis
- 1996
|
|
88
|
Speech database development: Design and analysis . Report no
– Lemel, Kassel, et al.
- 1986
|
|
79
|
Computational auditory scene analysis
– Brown, Cooke
- 1994
|
|
75
|
Robust automatic speech recognition with missing and unreliable acoustic data
– Cooke, Green, et al.
- 2001
|
|
74
|
Virtual Pitch and Phase Sensitivity of a Computer Model of the Auditory Periphery
– Meddis, Hewitt
- 1991
|
|
73
|
The DARPA 1000-Word Resource Management Database for Continuous Speech Recognition
– Price, Fisher, et al.
- 1988
|
|
63
|
Speech communications : human and machine
– O’Shaughnessy
- 1999
|
|
61
|
A Joint Synchrony/Mean-Rate Model of Auditory Speech Processing
– Seneff
- 1988
|
|
55
|
Speech recognition by machines and humans
– Lippmann
- 1997
|
|
44
|
Noise estimation techniques for robust speech recognition
– Hirsch, Ehrlicher
- 1995
|
|
41
|
Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise
– Lippmann, Carlson
- 1997
|
|
40
|
Speech and Hearing in Communication
– Fletcher
- 1953
|
|
40
|
Speaker Adaptation Of HMMs Using Linear Regression,” 24
– Leggetter, Woodland
- 1994
|
|
37
|
Speech Recognition in Noisy Environments
– Moreno
- 1996
|
|
34
|
Pitch Determination of Speech Signals: Algorithms and Devices
– Hess
- 1983
|
|
31
|
A Robust Algorithm for Pitch Tracking (RAPT
– Talkin
- 1995
|
|
25
|
Missing data techniques for robust speech recognition
– Cooke, Morris, et al.
- 1997
|
|
24
|
Multi-microphone correlation-based processing for robust speech recognition
– Sullivan, Stern
|
|
23
|
HMM recognition in noise using parallel model combination
– Gales, Young
- 1993
|
|
22
|
Handling missing data in speech recognition
– Cooke, Green, et al.
- 1994
|
|
10
|
A computer model of auditory stream segregation
– Beauvois, Meddis
- 1991
|
|
10
|
Maximum likelihood estimation for mixed continuous and categorical data with missing values
– Little, Schluchter
- 1985
|
|
9
|
Cochannel Speaker Separation by Harmonic Enhancement and Suppression
– Morgan, George, et al.
- 1997
|
|
9
|
Reconstruction of Incomplete Spectrograms for Robust Speech Recognition
– Raj
- 2000
|
|
8
|
A comparative study of several pitch detection algorithms
– Rabiner, Cheng, et al.
- 1976
|
|
6
|
An Approach to Co-Channel Talker Interference Suppression Using a Sinusoidal Model for Speech
– Quatieri, Danisewicz
- 1990
|
|
6
|
A theory and computational model of monaural auditory sound separation
– Weintraub
- 1985
|
|
2
|
Auditory Organization and Speech Perception: Pointers for Robust ASR
– Cooke, Green
- 2000
|
|
2
|
Optimal estimators for spectral estimators of noisy speech
– Porter, Boll
- 1984
|
|
2
|
Probability Theory, Random Processes, and Estimation Theory for Engineers
– Stark, Woods
- 1994
|