Results 1  10
of
17
Exemplarbased sparse representations for noise robust automatic speech recognition
, 2010
"... ..."
Probabilistic modeling paradigms for audio source separation
 In Machine Audition: Principles, Algorithms and Systems. IGI Global
, 2010
"... Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation system ..."
Abstract

Cited by 25 (14 self)
 Add to MetaCart
(Show Context)
Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, we focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local timefrequency models and spectral templatebased models. We show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. We compare the merits of either paradigm and report objective performance figures. We conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future stateoftheart systems.
Discrimination of speech and nonlinguistic vocalizations by nonnegative matrix factorization
 in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’10
, 2010
"... We introduce features based on NonNegative Matrix Factorization (NMF) for discrimination of speech and nonlinguistic vocalizations such as laughter or breathing, which is a crucial task in recognition of spontaneous speech. NMF has been successfully used in speechrelated tasks such as denoising a ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
We introduce features based on NonNegative Matrix Factorization (NMF) for discrimination of speech and nonlinguistic vocalizations such as laughter or breathing, which is a crucial task in recognition of spontaneous speech. NMF has been successfully used in speechrelated tasks such as denoising and speaker separation. While existing approaches use it as a preprocessing step for conventional speech recognizers, we aim at directly classifying the output of the NMF algorithm. To this end, we propose a feature extraction procedure based on a supervised variant of NMF, considering two different algorithms. Applying our approach to a spontaneous speech corpus, we show that addition of NMF features to an MFCCbased classifier increases mean recall of speech and nonlinguistic vocalizations by over 2.5 % absolute, and particularly recall of laughter by 6.6 % absolute. The improvement is significant at a level of 0.4 %.
Nonnegative matrix factorization as noiserobust feature extractor for speech recognition
 in Proceedings of the International Conference on Acoustics, Speech and Signal (ICASSP ’10
, 2010
"... We introduce a novel approach for noiserobust feature extraction in speech recognition, based on nonnegative matrix factorization (NMF). While NMF has previously been used for speech denoising and speaker separation, we directly extract timevarying features from the NMF output. To this end we ext ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We introduce a novel approach for noiserobust feature extraction in speech recognition, based on nonnegative matrix factorization (NMF). While NMF has previously been used for speech denoising and speaker separation, we directly extract timevarying features from the NMF output. To this end we extend basic unsupervised NMF to a hybrid supervised/unsupervised algorithm. We present a Dynamic Bayesian Network (DBN) architecture that can exploit these features in a Tandem manner together with the maximum likelihood phoneme estimate of a bidirectional long shortterm memory (BLSTM) recurrent neural network. We show that addition of NMF features to spelling recognition systems can increase word accuracy by up to 7 % absolute in a noisy car environment.
Activeset Newton algorithm for overcomplete nonnegative representations of audio
 IEEE Transactions on Audio, Speech, and Language Processing
, 2013
"... Abstract—This paper proposes a computationally efficient algorithm for estimating the nonnegative weights of linear combinations of the atoms of largescale audio dictionaries, so that the generalized KullbackLeibler divergence between an audio observation and the model is minimized. This linear m ..."
Abstract

Cited by 8 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper proposes a computationally efficient algorithm for estimating the nonnegative weights of linear combinations of the atoms of largescale audio dictionaries, so that the generalized KullbackLeibler divergence between an audio observation and the model is minimized. This linear model has been found useful in many audio signal processing tasks, but the existing algorithms are computationally slow when a large number of atoms is used. The proposed algorithm is based on iteratively updating a set of active atoms, with the weights updated using the Newton method and the step size estimated such that the weights remain nonnegative. Algorithm convergence evaluations on representing audio spectra that are mixtures of two speakers show that with all the tested dictionary sizes the proposed method reaches a much lower value of the divergence than can be obtained by conventional algorithms, and is up to 8 times faster. A source separation separation evaluation revealed that when using large dictionaries, the proposed method produces a better separation separation quality in less time. Index Terms—acoustic signal analysis, audio source separation, supervised source separation, nonnegative matrix factorization, Newton algorithm, convex optimization, sparse coding, sparse representation
SPECTRAL COVARIANCE IN PRIOR DISTRIBUTIONS OF NONNEGATIVE MATRIX FACTORIZATION BASED SPEECH SEPARATION
"... This paper proposes an algorithm for modeling the covariance of the spectrum in the prior distributions of nonnegative matrix factorization (NMF) based sound source separation. Supervised NMF estimates a set of spectrum basis vectors for each source, and then represents a mixture signal using them. ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
This paper proposes an algorithm for modeling the covariance of the spectrum in the prior distributions of nonnegative matrix factorization (NMF) based sound source separation. Supervised NMF estimates a set of spectrum basis vectors for each source, and then represents a mixture signal using them. When the exact characteristics of the sources are not known in advance, it is advantageous to train prior distributions of spectra instead of fixed spectra. Since the frequency bands in natural sound sources are strongly correlated, we model the distributions with fullcovariance Gaussian distributions. Algorithms for training and applying the distributions are presented. The proposed methods produce better separation quality that the reference methods. Demonstration signals are available at www.cs.tut.fi/~tuomasv. 1.
Adaptation of speakerspecific bases in nonnegative matrix factorization for single channel speechmusic separation
 in: Annual Conference of the International Speech Communication Association (INTERSPEECH
"... Abstract This paper introduces a speaker adaptation algorithm for nonnegative matrix factorization (NMF) models. The proposed adaptation algorithm is a combination of Bayesian and subspace model adaptation. The adapted model is used to separate speech signal from a background music signal in a sing ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract This paper introduces a speaker adaptation algorithm for nonnegative matrix factorization (NMF) models. The proposed adaptation algorithm is a combination of Bayesian and subspace model adaptation. The adapted model is used to separate speech signal from a background music signal in a single record. Training speech data for multiple speakers is used with NMF to train a set of basis vectors as a general model for speech signals. The probabilistic interpretation of NMF is used to achieve Bayesian adaptation to adjust the general model with respect to the actual properties of the speech signals that is observed in the mixed signal. The Bayesian adapted model is adapted again by a linear transform, which changes the subspace that the Bayesian adapted model spans to better match the speech signal that is in the mixed signal. The experimental results show that combining Bayesian with linear transform adaptation improves the separation results.
MODEL ORDER SELECTION FOR NONNEGATIVE MATRIX FACTORIZATION WITH APPLICATION TO SPEECH ENHANCEMENT
"... ABSTRACT This report deals with the application of nonnegative matrix factorization (NMF) in speech processing. A Bayesian NMF is used to find the optimal number of basis vectors for the speech signal. The result is validated by performing a speech enhancement task for a set of different number of ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
ABSTRACT This report deals with the application of nonnegative matrix factorization (NMF) in speech processing. A Bayesian NMF is used to find the optimal number of basis vectors for the speech signal. The result is validated by performing a speech enhancement task for a set of different number of basis vectors. The algorithm performance is measured with the Source to Distortion Ratio (SDR) that represents the overall quality of speech. The results show that for medium input SNRs, 60 basis vectors for each speaker are sufficient to model the speech spectrogram. NMF produced better SDR results than a recently developed version of Spectral Subtraction algorithm. The window length was found to have a great effect on the results, but zero padding did not influence the results.
Thesis for the degree of Doctor of Philosophy Speech Enhancement Using Nonnegative Matrix Factorization and Hidden Markov Models Mohammadiha, Nasser Speech Enhancement Using Nonnegative Matrix Factorization and Hidden Markov Models
"... Abstract Reducing interference noise in a noisy speech recording has been a challenging task for many years yet has a variety of applications, for example, in handsfree mobile communications, in speech recognition, and in hearing aids. Traditional singlechannel noise reduction schemes, such as Wie ..."
Abstract
 Add to MetaCart
Abstract Reducing interference noise in a noisy speech recording has been a challenging task for many years yet has a variety of applications, for example, in handsfree mobile communications, in speech recognition, and in hearing aids. Traditional singlechannel noise reduction schemes, such as Wiener filtering, do not work satisfactorily in the presence of nonstationary background noise. Alternatively, supervised approaches, where the noise type is known in advance, lead to higherquality enhanced speech signals. This dissertation proposes supervised and unsupervised singlechannel noise reduction algorithms. We consider two classes of methods for this purpose: approaches based on nonnegative matrix factorization (NMF) and methods based on hidden Markov models (HMM). The contributions of this dissertation can be divided into three main (overlapping) parts. First, we propose NMFbased enhancement approaches that use temporal dependencies of the speech signals. In a standard NMF, the important temporal correlations between consecutive shorttime frames are ignored. We propose both continuous and discrete statespace nonnegative dynamical models. These approaches are used to describe the dynamics of the NMF coefficients or activations. We derive optimal minimum mean squared error (MMSE) or linear MMSE estimates of the speech signal using the probabilistic formulations of NMF. Our experiments show that using temporal dynamics in the NMFbased denoising systems improves the performance greatly. Additionally, this dissertation proposes an approach to learn the noise basis matrix online from the noisy observations. This relaxes the assumption of an apriori specified noise type and enables us to use the NMFbased denoising method in an unsupervised manner. Our experiments show that the proposed approach with online noise basis learning considerably outperforms stateoftheart methods in different noise conditions. Second, this thesis proposes two methods for NMFbased separation of sources with similar dictionaries. We suggest a nonnegative HMM (NHMM) for babble noise that is derived from a speech HMM. In this approach, speech and babble signals share the same basis vectors, whereas the activation of the basis vectors are different for the two signals over time. We derive an MMSE estimator for the clean speech signal using the proposed NHMM. The objective evaluations and performed subjective listening test show that the i proposed babble model and the final noise reduction algorithm outperform the conventional methods noticeably. Moreover, the dissertation proposes another solution to separate a desired source from a mixture with arbitrarily low artifacts. Third, an HMMbased algorithm to enhance the speech spectra using superGaussian priors is proposed . Our experiments show that speech discrete Fourier transform (DFT) coefficients have superGaussian rather than Gaussian distributions even if we limit the speech data to come from a specific phoneme. We derive a new MMSE estimator for the speech spectra that uses superGaussian priors. The results of our evaluations using the developed noise reduction algorithm support the superGaussianity hypothesis.