Results 1  10
of
52
Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
 IEEE Trans. On Audio, Speech and Lang. Processing
, 2007
"... Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain ..."
Abstract

Cited by 185 (30 self)
 Add to MetaCart
(Show Context)
Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements. Index Terms—Acoustic signal analysis, audio source separation, blind source separation, music, nonnegative matrix factorization, sparse coding, unsupervised learning. I.
Convolutive speech bases and their application to supervised speech separation
 IEEE Transactions on Audio, Speech and Language Processing
, 2007
"... In this paper we present a convolutive basis decomposition method and its application on simultaneous speakers separation from monophonic recordings. The model we propose is a convolutive version of the nonnegative matrix factorization algorithm. Due to the nonnegativity constraint this type of co ..."
Abstract

Cited by 92 (6 self)
 Add to MetaCart
(Show Context)
In this paper we present a convolutive basis decomposition method and its application on simultaneous speakers separation from monophonic recordings. The model we propose is a convolutive version of the nonnegative matrix factorization algorithm. Due to the nonnegativity constraint this type of coding is very well suited for intuitively and efficiently representing magnitude spectra. We present results that reveal the nature of these basis functions and we introduce their utility in separating monophonic mixtures of known speakers.
C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation
 IEEE Trans. Audio, Speech, Language Process
, 2010
"... We consider inference in a general datadriven objectbased model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. Each source is given a model inspired from nonnegative matrix factorization (NMF) with the ItakuraSaito divergence, wh ..."
Abstract

Cited by 78 (17 self)
 Add to MetaCart
(Show Context)
We consider inference in a general datadriven objectbased model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. Each source is given a model inspired from nonnegative matrix factorization (NMF) with the ItakuraSaito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectationmaximization algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms were applied to stereo music and assessed in terms of blind source separation performance. Index Terms — Multichannel audio, nonnegative matrix factorization, nonnegative tensor factorization, underdetermined convolutive blind source separation. 1.
Multiple fundamental frequency estimation by summing harmonic amplitudes
 in ISMIR
, 2006
"... This paper proposes a conceptually simple and computationally efficient fundamental frequency (F0) estimator for polyphonic music signals. The studied class of estimators calculate the salience, or strength, of a F0 candidate as a weighted sum of the amplitudes of its harmonic partials. A mapping fr ..."
Abstract

Cited by 78 (10 self)
 Add to MetaCart
(Show Context)
This paper proposes a conceptually simple and computationally efficient fundamental frequency (F0) estimator for polyphonic music signals. The studied class of estimators calculate the salience, or strength, of a F0 candidate as a weighted sum of the amplitudes of its harmonic partials. A mapping from the Fourier spectrum to a “F0 salience spectrum” is found by optimization using generated training material. Based on the resulting function, three different estimators are proposed: a “direct ” method, an iterative estimation and cancellation method, and a method that estimates multiple F0s jointly. The latter two performed as well as a considerably more complex reference method. The number
Separation of drums from polyphonic music using nonnegative matrix factorization and support vector machine
 In: Proc. EUSIPCO’2005. (2005
, 2005
"... This paper presents a procedure for the separation of pitched musical instruments and drums from polyphonic music. The method is based on twostage processing in which the input signal is first separated into elementary timefrequency components which are then organized into sound sources. Nonnegat ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
(Show Context)
This paper presents a procedure for the separation of pitched musical instruments and drums from polyphonic music. The method is based on twostage processing in which the input signal is first separated into elementary timefrequency components which are then organized into sound sources. Nonnegative matrix factorization (NMF) is used to separate the input spectrogram into components having a fixed spectrum with timevarying gain. Each component is classified either to pitched instruments or to drums using a support vector machine (SVM). The classifier is trained using example signals from both classes. Simulation experiments were carried out using mixtures generated from realworld polyphonic music signals. The results indicate that the proposed method enables better separation quality than existing methods based on sinusoidal modeling and onset detection. Demonstration signals are available at
A general flexible framework for the handling of prior information in audio source separation
 IEEE Transactions on Audio, Speech and Signal Processing
, 2012
"... Abstract—Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general audio source separation framework based on a library of str ..."
Abstract

Cited by 46 (16 self)
 Add to MetaCart
(Show Context)
Abstract—Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general audio source separation framework based on a library of structured source models that enable the incorporation of prior knowledge about each source via userspecifiable constraints. While this framework generalizes several existing audio source separation methods, it also allows to imagine and implement new efficient methods that were not yet reported in the literature. We first introduce the framework by describing the model structure and constraints, explaining its generality, and summarizing its algorithmic implementation using a generalized expectationmaximization algorithm. Finally, we illustrate the abovementioned capabilities of the framework by applying it in several new and existing configurations to different source separation problems. We have released a software tool named Flexible Audio Source Separation Toolbox (FASST) implementing a baseline version of the framework in Matlab. Index Terms—Audio source separation, local Gaussian model, nonnegative matrix factorization, expectationmaximization I.
Unsupervised analysis of polyphonic music by sparse coding
 IEEE Transactions on Neural Networks
, 2006
"... We investigate a datadriven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of shortterm Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small nu ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
(Show Context)
We investigate a datadriven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of shortterm Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small number of “atomic ” spectra chosen from a larger dictionary; this dictionary is, in turn, learned from the data in such a way as to represent the given training set in an (information theoretically) efficient way. When exposed to examples of polyphonic music, most of the dictionary elements take on the spectral characteristics of individual notes in the music, so that the sparse decomposition can be used to identify the notes in a polyphonic mixture. Our approach differs from other methods of polyphonic analysis based on spectral decomposition by combining all of the following: a) a formulation in terms of an explicitly given probabilistic model, in which the process estimating which notes are present corresponds naturally with the inference of latent variables in the model; b) a particularly simple generative model, motivated by very general considerations about efficient coding, that makes very few assumptions about the musical origins of the signals being processed; and c) the ability to learn a dictionary of atomic spectra (most of which converge to harmonic spectral profiles associated with specific notes) from polyphonic examples alone—no separate training on monophonic examples is required. Index Terms Learning overcomplete dictionaries, polyphonic music, probabilistic modeling, redundancy reduction, sparse factorial coding, unsupervised learning. ©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any
Bayesian extensions to nonnegative matrix factorisation for audio signal modelling
 in ICASSP, 2008
"... We describe the underlying probabilistic generative signal model of nonnegative matrix factorisation (NMF) and propose a realistic conjugate priors on the matrices to be estimated. A conjugate Gamma chain prior enables modelling the spectral smoothness of natural sounds in general, and other prior ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
(Show Context)
We describe the underlying probabilistic generative signal model of nonnegative matrix factorisation (NMF) and propose a realistic conjugate priors on the matrices to be estimated. A conjugate Gamma chain prior enables modelling the spectral smoothness of natural sounds in general, and other prior knowledge about the spectra of the sounds can be used without resorting to too restrictive techniques where some of the parameters are fixed. The resulting algorithm, while retaining the attractive features of standard NMF such as fast convergence and easy implementation, outperforms existing NMF strategies in a single channel audio source separation and detection task. Index Terms — acoustic signal processing, matrix decomposition, MAP estimation, source separation 1.
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
(Show Context)
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, onechannel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of realworld sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses modelbased inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
Realtime multiple pitch observation using sparse nonnegative constraints
 in International Conference on Music Information Retrieval
, 2006
"... In this paper we introduce a new approach for realtime multiple pitch observation of musical instruments. The proposed algorithm is quite different from others in the literature both in its purpose and approach. It is destined not for continuous multiple f0 recognition but rather for projection of t ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
(Show Context)
In this paper we introduce a new approach for realtime multiple pitch observation of musical instruments. The proposed algorithm is quite different from others in the literature both in its purpose and approach. It is destined not for continuous multiple f0 recognition but rather for projection of the ongoing spectrum to learned pitch templates. The decomposition algorithm on the other hand, does not compromise signal processing models for pitches and consists of an algorithm for efficient decomposition of a spectrum using known pitch structures and based on sparse nonnegative constraints. After introducing the algorithm along with evaluations, a realtime implementation of the algorithm is provided for free download for the MaxMSP realtime programming environment. Keywords: Multiplepitch observation, Nonnegative Matrix Factorization, Sparseness constraints, Machine Learning.