Results 1  10
of
30
Audio Denoising by Timefrequency Block Thresholding
, 2007
"... Removing noise from audio signals requires a nondiagonal processing of timefrequency coefficients to avoid producing “musical noise”. State of the art algorithms perform a parameterized filtering of spectrogram coefficients with empirically fixed parameters. A block thresholding estimation procedu ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
Removing noise from audio signals requires a nondiagonal processing of timefrequency coefficients to avoid producing “musical noise”. State of the art algorithms perform a parameterized filtering of spectrogram coefficients with empirically fixed parameters. A block thresholding estimation procedure is introduced, which adjusts all parameters adaptively to signal property by minimizing a Stein estimation of the risk. Numerical experiments demonstrate the performance and robustness of this procedure through objective and subjective evaluations.
Speech enhancement using a noncausal a priori SNR estimator
 Signal Processing Letters, IEEE
, 2004
"... Abstract—In this letter, we propose a noncausal estimator for the a priori signaltonoise ratio (SNR), and a corresponding noncausal speech enhancement algorithm. In contrast to the decisiondirected estimator of Ephraim and Malah, the noncausal estimator is capable of discriminating between speech ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In this letter, we propose a noncausal estimator for the a priori signaltonoise ratio (SNR), and a corresponding noncausal speech enhancement algorithm. In contrast to the decisiondirected estimator of Ephraim and Malah, the noncausal estimator is capable of discriminating between speech onsets and noise irregularities. Onsets of speech are better preserved, while a further reduction of musical noise is achieved. Experimental results show that the noncausal estimator yields a higher improvement in the segmental SNR, lower logspectral distortion, and better Perceptual Evaluation of Speech Quality scores (PESQ, ITUT P.862). Index Terms—Parameter estimation, speech enhancement, timefrequency analysis. I.
Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models
, 2006
"... In this paper, we develop and evaluate speech enhancement algorithms, which are based on supergaussian generalized autoregressive conditional heteroscedasticity (GARCH) models in the shorttime Fourier transform (STFT) domain. We consider three different statistical models, two fidelity criteria, an ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we develop and evaluate speech enhancement algorithms, which are based on supergaussian generalized autoregressive conditional heteroscedasticity (GARCH) models in the shorttime Fourier transform (STFT) domain. We consider three different statistical models, two fidelity criteria, and two approaches for the estimation of the variances of the STFT coefficients. The statistical model is either Gaussian, Gamma or Laplacian; the fidelity criteria include minimum meansquared error (MMSE) of the STFT coefficients and MMSE of the logspectral amplitude (LSA); the spectral variance is estimated based on either the proposed GARCH models or the decisiondirected method of Ephraim and Malah. We show that estimating the variance by the GARCH modeling method yields lower logspectral distortion and higher perceptual evaluation of speech quality scores (PESQ, ITUT P.862) than by using the decisiondirected method, whether the presumed statistical model is Gaussian, Gamma or Laplacian, and whether the fidelity criterion is MMSE of the STFT coefficients or MMSE of the LSA. furthermore while a gaussian model is inferior to the supergaussian models when USING the decisiondirected method, the Gaussian model is superior when using the garch modeling method.
Tracking of nonstationary noise based on datadriven recursive noise power estimation
 IEEE Trans. Audio, Speech and Language Processing
, 2008
"... Abstract—This paper considers estimation of the noise spectral variance from speech signals contaminated by highly nonstationary noise sources. The method can accurately track fast changes in noise power level (up to about 10 dB/s). In each time frame, for each frequency bin, the noise variance est ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract—This paper considers estimation of the noise spectral variance from speech signals contaminated by highly nonstationary noise sources. The method can accurately track fast changes in noise power level (up to about 10 dB/s). In each time frame, for each frequency bin, the noise variance estimate is updated recursively with the minimum meansquare error (mmse) estimate of the current noise power. A time and frequencydependent smoothing parameter is used, which is varied according to an estimate of speech presence probability. In this way, the amount of speech power leaking into the noise estimates is kept low. For the estimation of the noise power, a spectral gain function is used, which is found by an iterative datadriven training method. The proposed noise tracking method is tested on various stationary and nonstationary noise sources, for a wide range of signaltonoise ratios, and compared with two stateoftheart methods. When used in a speech enhancement system, improvements in segmental signaltonoise ratio of more than 1 dB can be obtained for the most nonstationary noise sources at high noise levels. Index Terms—Discrete Fourier transform (DFT)based speech enhancement, minimum meansquare error (mmse) estimation, noise spectrum estimation, noise tracking. I.
Recent advancements in speech enhancement
, 2006
"... Speech enhancement is a long standing problem with numerous applications ranging from hearing aids, to coding and automatic recognition of speech signals. In this survey paper we focus on enhancement from a single microphone, and assume that the noise is additive and statistically independent of the ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Speech enhancement is a long standing problem with numerous applications ranging from hearing aids, to coding and automatic recognition of speech signals. In this survey paper we focus on enhancement from a single microphone, and assume that the noise is additive and statistically independent of the signal. We present the principles that guide researchers working in this area, and provide a detailed design example. The example focuses on minimum mean square error estimation of the clean signal’s logspectral magnitude. This approach has attracted significant attention in the past twenty years. We also describe the principles of a MonteCarlo simulation approach for speech enhancement. 1
Spectral magnitude minimum meansquare error estimation using binary and continuous gain functions
 IEEE Trans. Audio, Speech, Lang. Process
, 2012
"... Abstract—Recently, binary mask techniques have been proposed as a tool for retrieving a target speech signal from a noisy observation. A binary gain function is applied to time–frequency tiles of the noisy observation in order to suppress noise dominated and retain target dominated time–frequency r ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Recently, binary mask techniques have been proposed as a tool for retrieving a target speech signal from a noisy observation. A binary gain function is applied to time–frequency tiles of the noisy observation in order to suppress noise dominated and retain target dominated time–frequency regions. When implemented using discrete Fourier transform (DFT) techniques, the binary mask techniques can be seen as a special case of the broader class of DFTbased speech enhancement algorithms, for which the applied gain function is not constrained to be binary. In this context, we develop and compare binary mask techniques to stateoftheart continuous gain techniques. We derive spectral magnitude minimum meansquare error binary gain estimators; the binary gain estimators turn out to be simple functions of the continuous gain estimators. We show that the optimal binary estimators are closely related to a range of existing, heuristically developed, binary gain estimators. The derived binary gain estimators perform better than existing binary gain estimators in simulation experiments with speech signals contaminated by several different noise sources as measured by speech quality and intelligibility measures. However, even the best binary mask method is significantly outperformed by stateoftheart continuous gain estimators. The instrumental intelligibility results are confirmed in an intelligibility listening test.
Joint dereverberation and residual echo suppression of speech signals in noisy environments
 2008) and Chang EURASIP Journal on Advances in Signal Processing 2012, 2012:11 http://asp.eurasipjournals.com/content/2012/1/11 Page 9 of 9
"... Abstract—Handsfree devices are often used in a noisy and reverberant environment. Therefore, the received microphone signal does not only contain the desired nearend speech signal but also interferences such as room reverberation that is caused by the nearend source, background noise and a faren ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Handsfree devices are often used in a noisy and reverberant environment. Therefore, the received microphone signal does not only contain the desired nearend speech signal but also interferences such as room reverberation that is caused by the nearend source, background noise and a farend echo signal that results from the acoustic coupling between the loudspeaker and the microphone. These interferences degrade the fidelity and intelligibility of nearend speech. In the last two decades, postfilters have been developed that can be used in conjunction with a single microphone acoustic echo canceller to enhance the nearend speech. In previous works, spectral enhancement techniques have been used to suppress residual echo and background noise for single microphone acoustic echo cancellers. However, dereverberation of the nearend speech was not addressed in this context. Recently, practically feasible spectral enhancement
Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition.
"... Cepstral normalisation in automatic speech recognition is investigated in the context of robustness to additive noise. In this paper, it is argued that such normalisation leads naturally to a speech feature based on signal to noise ratio rather than absolute energy (or power). Explicit calculation o ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Cepstral normalisation in automatic speech recognition is investigated in the context of robustness to additive noise. In this paper, it is argued that such normalisation leads naturally to a speech feature based on signal to noise ratio rather than absolute energy (or power). Explicit calculation of this SNRcepstrum by means of a noise estimate is shown to have theoretical and practical advantages over the usual (energy based) cepstrum. The relationship between the SNRcepstrum and the articulation index, known in psychoacoustics, is discussed. Experiments are presented suggesting that the combination of the SNRcepstrum with the well known perceptual linear prediction method can be beneficial in noisy environments.
SNR FEATURES FOR AUTOMATIC SPEECH RECOGNITION
, 2009
"... When combined with cepstral normalisation techniques, the features normally used in Automatic Speech Recognition are based on Signal to Noise Ratio (SNR). We show that calculating SNR from the outset, rather than relying on cepstral normalisation to produce it, gives features with a number of practi ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
When combined with cepstral normalisation techniques, the features normally used in Automatic Speech Recognition are based on Signal to Noise Ratio (SNR). We show that calculating SNR from the outset, rather than relying on cepstral normalisation to produce it, gives features with a number of practical and mathematical advantages over powerspectral based ones. In a detailed analysis, we derive Maximum Likelihood and Maximum aPosteriori estimates for SNR based features, and show that they can outperform more conventional ones, especially when subsequently combined with cepstral variance normalisation. We further show anecdotal evidence that SNR based features lend themselves well to noise estimates based on lowenergy envelope tracking. 1
Estimators of The MagnitudeSquared Spectrum and Methods for Incorporating SNR Uncertainty
"... Abstract—Statistical estimators of the magnitudesquared spectrum are derived based on the assumption that the magnitudesquared spectrum of the noisy speech signal can be computed as the sum of the (clean) signal and noise magnitudesquared spectra. Maximum a posterior (MAP) and minimum mean square ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Statistical estimators of the magnitudesquared spectrum are derived based on the assumption that the magnitudesquared spectrum of the noisy speech signal can be computed as the sum of the (clean) signal and noise magnitudesquared spectra. Maximum a posterior (MAP) and minimum mean square error (MMSE) estimators are derived based on a Gaussian statistical model. The gain function of the MAP estimator was found to be identical to the gain function used in the ideal binary mask (IdBM) that is widely used in computational auditory scene analysis (CASA). As such, it was binary and assumed the value of 1 if the local SNR exceeded 0 dB, and assumed the value of 0 otherwise. By modeling the local instantaneous SNR as an Fdistributed random variable, soft masking methods were derived incorporating SNR uncertainty. The soft masking method, in particular, which weighted the noisy magnitudesquared spectrum by the a priori probability that the local SNR exceeds 0 dB was shown to be identical to the Wiener gain function. Results indicated that the proposed estimators yielded significantly better speech quality than the conventional MMSE spectral power estimators, in terms of yielding lower residual noise and lower speech distortion.