Results 1  10
of
42
Exemplarbased sparse representations for noise robust automatic speech recognition
, 2010
"... ..."
Mixtures of Gamma Priors for NonNegative Matrix Factorization Based Speech Separation
 in 8th International Conference on Independent Component Analysis and Signal Separation (ICA
, 2009
"... Abstract. This paper deals with audio source separation using supervised nonnegative matrix factorization (NMF). We propose a prior model based on mixtures of Gamma distributions for each sound class, which hyperparameters are trained given a training corpus. This formulation allows adapting the sp ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
Abstract. This paper deals with audio source separation using supervised nonnegative matrix factorization (NMF). We propose a prior model based on mixtures of Gamma distributions for each sound class, which hyperparameters are trained given a training corpus. This formulation allows adapting the spectral basis vectors of the sound sources during actual operation, when the exact characteristics of the sources are not known in advance. Simulations were conducted using a random mixture of two speakers. Even without adaptation the mixture model outperformed the basic NMF, and adaptation furher improved slightly the separation quality. Audio demonstrations are available at www.cs.tut.fi/~tuomasv. 1
Transcribing Multiinstrument Polyphonic Music with Hierarchical Eigeninstruments
 in Sig. Process
, 2011
"... Abstract—This paper presents a general probabilistic model for transcribing singlechannel music recordings containing multiple polyphonic instrument sources. The system requires no prior knowledge of the instruments present in the mixture (other than the number), although it can benefit from inform ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a general probabilistic model for transcribing singlechannel music recordings containing multiple polyphonic instrument sources. The system requires no prior knowledge of the instruments present in the mixture (other than the number), although it can benefit from information about instrument type if available. In contrast to many existing polyphonic transcription systems, our approach explicitly models the individual instruments and is thereby able to assign detected notes to their respective sources. We use training instruments to learn a set of linear manifolds in model parameter space which are then used during transcription to constrain the properties of models fit to the target mixture. This leads to a hierarchical mixtureofsubspaces design which makes it possible to supply the system with prior knowledge at different levels of abstraction. The proposed technique is evaluated on both recorded and synthesized mixtures containing two, three, four, and five instruments each. We compare our approach in terms of transcription with (i.e. detected pitches must be associated with the correct instrument) and without sourceassignment to another multiinstrument transcription system as well as a baseline NMF algorithm. For twoinstrument mixtures evaluated with sourceassignment, we obtain average framelevel Fmeasures of up to 0.52 in the completely blind transcription setting (i.e. no prior knowledge of the instruments in the mixture) and up to 0.67 if we assume knowledge of the basic instrument types. For transcription without source assignment, these numbers rise to 0.76 and 0.83, respectively. Index Terms—Music, polyphonic transcription, NMF, subspace, eigeninstruments
A nonparametric Bayesian multipitch analyzer based on infinite latent harmonic allocation
 IEEE Trans. on ASLP
"... Abstract—The statistical multipitch analyzer described in this paper estimates multiple fundamental frequencies (F0s) in polyphonic music audio signals produced by pitched instruments. It is based on hierarchical nonparametric Bayesian models that can deal with uncertainty of unknown random variable ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
Abstract—The statistical multipitch analyzer described in this paper estimates multiple fundamental frequencies (F0s) in polyphonic music audio signals produced by pitched instruments. It is based on hierarchical nonparametric Bayesian models that can deal with uncertainty of unknown random variables such as model complexities (e.g., the number of F0s and the number of harmonic partials), model parameters (e.g., the values of F0s and the relative weights of harmonic partials), and hyperparameters (i.e., prior knowledge on complexities and parameters). Using these models, we propose a statistical method called infinite latent harmonic allocation (iLHA). To avoid modelcomplexity control, we allow the observed spectra to contain an unbounded number of sound sources (F0s), each of which is allowed to contain an unbounded number of harmonic partials. More specifically, to model a set of timesliced spectra, we formulated nested infinite Gaussian mixture models based on hierarchical and generalized Dirichlet processes. To avoid manual tuning of influential hyperparameters, we put noninformative hyperprior distributions on them in a hierarchical manner. For efficient Bayesian inference, we used a modern technique called collapsed variational Bayes. In comparative experiments using audio recordings of piano and guitar solo performances, iLHA yielded promising results and we found that there would be room for improvement based on modeling of temporal continuity and spectral smoothness. Index Terms—Bayesian nonparametrics, Dirichlet process, infinite latent harmonic allocation (iLHA), multipitch analysis.
Gamma Markov random fields for audio source modeling
, 2010
"... Abstract—In many audio processing tasks, such as source separation, denoising or compression, it is crucial to construct realistic and flexible models to capture the physical properties of audio signals. This can be accomplished in the Bayesian framework through the use of appropriate prior distribu ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In many audio processing tasks, such as source separation, denoising or compression, it is crucial to construct realistic and flexible models to capture the physical properties of audio signals. This can be accomplished in the Bayesian framework through the use of appropriate prior distributions. In this paper, we describe a class of prior models called Gamma Markov random fields (GMRFs) to model the sparsity and the local dependency of the energies (i.e., variances) of time–frequency expansion coefficients. A GMRF model describes a nonnormalised joint distribution over unobserved variance variables, where given the field the actual source coefficients are independent. Our construction ensures a positive coupling between the variance variables, so that signal energy changes smoothly over both axes to capture the temporal and spectral continuity. The coupling strength is controlled by a set of hyperparameters. Inference on the overall model is convenient because of the conditional conjugacy of all of the variables in the model, but automatic optimization of hyperparameters is crucial to obtain better fits. The marginal likelihood of the model is not available because of the intractable normalizing constant of GMRFs. In this paper, we optimize the hyperparameters of our GMRFbased audio model using contrastive divergence and compare this method to alternatives such as score matching and pseudolikelihood maximization where applicable. We present the performance of the GMRF models in denoising and singlechannel source separation problems in completely blind scenarios, where all the hyperparameters are jointly estimated given only audio data. Index Terms—Audio modeling, contrastive divergence, denoising, Gibbs sampling, Markov random fields, pseudolikelihood, score matching, singlechannel source separation.
C.: Online algorithms for Nonnegative Matrix Factorization with the ItakuraSaito divergence
 In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA
, 2011
"... Nonnegative matrix factorization (NMF) is now a common tool for audio source separation.When learning NMF on large audio databases, one major drawback is that the complexity in time is O(FKN) when updating the dictionary (where (F,N) is the dimension of the input power spectrograms, and K the number ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Nonnegative matrix factorization (NMF) is now a common tool for audio source separation.When learning NMF on large audio databases, one major drawback is that the complexity in time is O(FKN) when updating the dictionary (where (F,N) is the dimension of the input power spectrograms, and K the number of basis spectra), thus forbidding its application on signals longer than an hour. We provide an online algorithm with a complexity of O(FK) in time and memory for updates in the dictionary. We show on audio simulations that the online approach is faster for short audio signals and allows to analyze audio signals of several hours. 1
Majorizationminimization algorithm for smooth ItakuraSaito nonnegative matrix factorization
 in ICASSP
, 2011
"... Nonnegative matrix factorization (NMF) with the ItakuraSaito divergence has proven efficient for audio source separation and music transcription, where the signal power spectrogram is factored into a “dictionary ” matrix times an “activation” matrix. Given the nature of audio signals it is expected ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Nonnegative matrix factorization (NMF) with the ItakuraSaito divergence has proven efficient for audio source separation and music transcription, where the signal power spectrogram is factored into a “dictionary ” matrix times an “activation” matrix. Given the nature of audio signals it is expected that the activation coefficients exhibit smoothness along time frames. This may be enforced by penalizing the NMF objective function with an extra term reflecting smoothness of the activation coefficients. We propose a novel regularization term that solves some deficiencies of our previous work and leadstoanefficient implementation using a majorizationminimization procedure. Index Terms — Nonnegative matrix factorization (NMF), ItakuraSaito divergence, regularization by smoothness, audio signal representation, singlechannel source separation. 1.
Generative spectrogram factorization models for polyphonic piano transcription
 Transactions on Audio, Speech and Language Processing
, 2010
"... Abstract—We introduce a framework for probabilistic generative models of time–frequency coefficients of audio signals, using a matrix factorization parametrization to jointly model spectral characteristics such as harmonicity and temporal activations and excitations. The models represent the observe ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We introduce a framework for probabilistic generative models of time–frequency coefficients of audio signals, using a matrix factorization parametrization to jointly model spectral characteristics such as harmonicity and temporal activations and excitations. The models represent the observed data as the superposition of statistically independent sources, and we consider variancebased models used in source separation and intensitybased models for nonnegative matrix factorization. We derive a generalized expectationmaximization algorithm for inferring the parameters of the model and then adapt this algorithm for the task of polyphonic transcription of music using labeled training data. The performance of the system is compared to that of existing discriminative and modelbased approaches on a dataset of solo piano music. Index Terms—Frequency estimation, matrix decomposition, music information retrieval (MIR), spectral analysis, time–frequency analysis. I.
USING TENSOR FACTORISATION MODELS TO SEPARATE DRUMS FROM POLYPHONIC MUSIC
"... This paper describes the use of Nonnegative Tensor Factorisation models for the separation of drums from polyphonic audio. Improved separation of the drums is achieved through the incorporation of Gamma Chain priors into the Nonnegative Tensor Factorisation framework. In contrast to many previo ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
This paper describes the use of Nonnegative Tensor Factorisation models for the separation of drums from polyphonic audio. Improved separation of the drums is achieved through the incorporation of Gamma Chain priors into the Nonnegative Tensor Factorisation framework. In contrast to many previous approaches, the method used in this paper requires little or no pretraining or use of drum templates. The utility of the technique is shown on realworld audio examples. 1.
Poissonuniform nonnegative matrix factorization
 in Proc. ICASSP
, 2012
"... Probabilistic models of audio spectrograms used in audio source separation often rely on Poisson or multinomial noise models corresponding to the generalized KullbackLeibler (GKL) divergence popular in methods using Nonnegative Matrix Factorization (NMF). This noise model works well in practice, bu ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Probabilistic models of audio spectrograms used in audio source separation often rely on Poisson or multinomial noise models corresponding to the generalized KullbackLeibler (GKL) divergence popular in methods using Nonnegative Matrix Factorization (NMF). This noise model works well in practice, but it is difficult to justify since these distributions are technically only applicable to discrete counts data. This issue is particularly problematic in hierarchical and nonparametric Bayesian models where estimates of uncertainty depend strongly on the likelihood model. In this paper, we present a hierarchical Bayesian model that retains the flavor of the Poisson likelihood model but yields a coherent generative process for continuous spectrogram data. This model allows for more principled, accurate, and effective Bayesian inference in probabilistic NMF models based on GKL. Index Terms — NMF, audio, Bayesian models, variational inference, blind source separation.