Results 1  10
of
21
Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
 IEEE Trans. On Audio, Speech and Lang. Processing
, 2007
"... Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain ..."
Abstract

Cited by 185 (30 self)
 Add to MetaCart
(Show Context)
Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements. Index Terms—Acoustic signal analysis, audio source separation, blind source separation, music, nonnegative matrix factorization, sparse coding, unsupervised learning. I.
Separation of drums from polyphonic music using nonnegative matrix factorization and support vector machine
 In: Proc. EUSIPCO’2005. (2005
, 2005
"... This paper presents a procedure for the separation of pitched musical instruments and drums from polyphonic music. The method is based on twostage processing in which the input signal is first separated into elementary timefrequency components which are then organized into sound sources. Nonnegat ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
(Show Context)
This paper presents a procedure for the separation of pitched musical instruments and drums from polyphonic music. The method is based on twostage processing in which the input signal is first separated into elementary timefrequency components which are then organized into sound sources. Nonnegative matrix factorization (NMF) is used to separate the input spectrogram into components having a fixed spectrum with timevarying gain. Each component is classified either to pitched instruments or to drums using a support vector machine (SVM). The classifier is trained using example signals from both classes. Simulation experiments were carried out using mixtures generated from realworld polyphonic music signals. The results indicate that the proposed method enables better separation quality than existing methods based on sinusoidal modeling and onset detection. Demonstration signals are available at
Unsupervised analysis of polyphonic music by sparse coding
 IEEE Transactions on Neural Networks
, 2006
"... We investigate a datadriven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of shortterm Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small nu ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
(Show Context)
We investigate a datadriven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of shortterm Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small number of “atomic ” spectra chosen from a larger dictionary; this dictionary is, in turn, learned from the data in such a way as to represent the given training set in an (information theoretically) efficient way. When exposed to examples of polyphonic music, most of the dictionary elements take on the spectral characteristics of individual notes in the music, so that the sparse decomposition can be used to identify the notes in a polyphonic mixture. Our approach differs from other methods of polyphonic analysis based on spectral decomposition by combining all of the following: a) a formulation in terms of an explicitly given probabilistic model, in which the process estimating which notes are present corresponds naturally with the inference of latent variables in the model; b) a particularly simple generative model, motivated by very general considerations about efficient coding, that makes very few assumptions about the musical origins of the signals being processed; and c) the ability to learn a dictionary of atomic spectra (most of which converge to harmonic spectral profiles associated with specific notes) from polyphonic examples alone—no separate training on monophonic examples is required. Index Terms Learning overcomplete dictionaries, polyphonic music, probabilistic modeling, redundancy reduction, sparse factorial coding, unsupervised learning. ©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
(Show Context)
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, onechannel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of realworld sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses modelbased inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
Drum Transcription with nonnegative spectrogram factorisation
 IN EUSIPCO
, 2005
"... This paper describes a novel method for the automatic transcription of drum sequences. The method is based on separating the target drum sounds from the input signal using nonnegative matrix factorisation, and on detecting sound onsets from the separated signals. The separation algorithm factorises ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
(Show Context)
This paper describes a novel method for the automatic transcription of drum sequences. The method is based on separating the target drum sounds from the input signal using nonnegative matrix factorisation, and on detecting sound onsets from the separated signals. The separation algorithm factorises the spectrogram of the input signal into a sum of instrument spectrograms, each having a fixed spectrum and a timevarying gain. The spectra are calculated from a set of training signals, and the timevarying gains are estimated with an algorithm stemming from nonnegative matrix factorisation. Onset times of the instruments are detected from the estimated timevarying gains. The system gave better results than two stateoftheart methods in simulations with acoustic signals containing polyphonic drum sequences, and overall hit rate of 96 % was accomplished. Demonstrational signals are available at
Nonnegative tensor factorisation for sound source separation
 IN: PROCEEDINGS OF IRISH SIGNALS AND SYSTEMS CONFERENCE
, 2005
"... ... is introduced which extends current matrix factorisation techniques to deal with tensors. The effectiveness of the algorithm is then demonstrated through tests on synthetic data. The algorithm is then employed as a means of performing sound source separation on two channel mixtures, and the sepa ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
... is introduced which extends current matrix factorisation techniques to deal with tensors. The effectiveness of the algorithm is then demonstrated through tests on synthetic data. The algorithm is then employed as a means of performing sound source separation on two channel mixtures, and the separation capabilities of the algorithm demonstrated on a two channel mixture containing saxophone, strings and bass guitar.
Transcribing Multiinstrument Polyphonic Music with Hierarchical Eigeninstruments
 in Sig. Process
, 2011
"... Abstract—This paper presents a general probabilistic model for transcribing singlechannel music recordings containing multiple polyphonic instrument sources. The system requires no prior knowledge of the instruments present in the mixture (other than the number), although it can benefit from inform ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a general probabilistic model for transcribing singlechannel music recordings containing multiple polyphonic instrument sources. The system requires no prior knowledge of the instruments present in the mixture (other than the number), although it can benefit from information about instrument type if available. In contrast to many existing polyphonic transcription systems, our approach explicitly models the individual instruments and is thereby able to assign detected notes to their respective sources. We use training instruments to learn a set of linear manifolds in model parameter space which are then used during transcription to constrain the properties of models fit to the target mixture. This leads to a hierarchical mixtureofsubspaces design which makes it possible to supply the system with prior knowledge at different levels of abstraction. The proposed technique is evaluated on both recorded and synthesized mixtures containing two, three, four, and five instruments each. We compare our approach in terms of transcription with (i.e. detected pitches must be associated with the correct instrument) and without sourceassignment to another multiinstrument transcription system as well as a baseline NMF algorithm. For twoinstrument mixtures evaluated with sourceassignment, we obtain average framelevel Fmeasures of up to 0.52 in the completely blind transcription setting (i.e. no prior knowledge of the instruments in the mixture) and up to 0.67 if we assume knowledge of the basic instrument types. For transcription without source assignment, these numbers rise to 0.76 and 0.83, respectively. Index Terms—Music, polyphonic transcription, NMF, subspace, eigeninstruments
Underdetermined source separation with structured source priors
 in Proc. of the Int. Conf. on Independent Component Analysis and Blind Source Separation (ICA
"... Abstract. We consider the source extraction problem for stereo instantaneous musical mixtures with more than two sources. We prove that usual separation methods based only on spatial diversity have performance limitations when the sources overlap in the timefrequency plane. We propose a new separat ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the source extraction problem for stereo instantaneous musical mixtures with more than two sources. We prove that usual separation methods based only on spatial diversity have performance limitations when the sources overlap in the timefrequency plane. We propose a new separation scheme combining spatial diversity and structured source priors. We present possible priors based on nonlinear Independent Subspace Analysis (ISA) and Hidden Markov Models (HMM), whose parameters are learnt on solo musical excerpts. We show with an example that they actually improve the separation performance. 1
Generalised prior subspace analysis for polyphonic pitch transcription
 in Proc. Int. Conf. on Digital Audio Effects (DAFx
, 2005
"... A reformulation of Prior Subspace Analysis (PSA) is presented, which restates the problem as that of fitting an undercomplete signal dictionary to a spectrogram. Further, a generalization of PSA is derived which allows the transcription of polyphonic pitched instruments. This involves the translatio ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
A reformulation of Prior Subspace Analysis (PSA) is presented, which restates the problem as that of fitting an undercomplete signal dictionary to a spectrogram. Further, a generalization of PSA is derived which allows the transcription of polyphonic pitched instruments. This involves the translation of a single frequency prior subspace of a note to approximate other notes, overcoming the problem of needing a separate basis function for each note played by an instrument. Examples are then demonstrated which show the utility of the generalised PSA algorithm for the purposes of polyphonic pitch transcription. 1.
Monaural Sound Source Separation by Perceptually Weighted NonNegative Matrix Factorization
"... Abstract — A dataadaptive algorithm for the separation of sound sources from onechannel signals is presented. The algorithm applies weighted nonnegative matrix factorization on the power spectrogram of the input signal. Perceptually motivated weights for each critical band in each frame are used ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Abstract — A dataadaptive algorithm for the separation of sound sources from onechannel signals is presented. The algorithm applies weighted nonnegative matrix factorization on the power spectrogram of the input signal. Perceptually motivated weights for each critical band in each frame are used to model the loudness perception of the human auditory system. The method compresses highenergy components, and enables the estimation of perceptually significant lowenergy characteristics of sources. The power spectrogram is factorized into a sum of components which have a fixed magnitude spectrum with a timevarying gain. Each source consists of one or more components. The parameters of the components are estimated by minimizing the weighted divergence between the observed power spectrogram and the model, for which a weighted nonnegative matrix factorization algorithm is proposed. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and percussive sounds. The performance of the proposed method was compared with other separation algorithms which are based on the same signal model. These include for example independent subspace analysis and sparse coding. According to the simulations the proposed method enables perceptually better separation quality than the existing algorithms. Demonstration signals are available at