• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria (2007)

by T Virtanen
Venue:IEEE Transactions on Audio, Speech and Language Processing
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 189
Next 10 →

C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation

by Alexey Ozerov, Cédric Févotte - IEEE Trans. Audio, Speech, Language Process , 2010
"... We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. Each source is given a model inspired from nonnegative matrix factorization (NMF) with the Itakura-Saito divergence, wh ..."
Abstract - Cited by 79 (17 self) - Add to MetaCart
We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. Each source is given a model inspired from nonnegative matrix factorization (NMF) with the Itakura-Saito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectation-maximization algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms were applied to stereo music and assessed in terms of blind source separation performance. Index Terms — Multichannel audio, nonnegative matrix factorization, nonnegative tensor factorization, underdetermined convolutive blind source separation. 1.
(Show Context)

Citation Context

... where the data matrix is taken as the magnitude or power spectrogram of a sound signal. NMF was for example applied with success to automatic music transcription [2], [3] and audio source separation =-=[4]-=-, [5]. The factorization amounts to decomposing the spectrogram data into a sum of rank-1 spectrograms, each of which being the expression of an Manuscript received December 24, 2008; revised August 1...

Exemplar-based sparse representations for noise robust automatic speech recognition

by Jort F. Gemmeke, et al. , 2010
"... ..."
Abstract - Cited by 55 (30 self) - Add to MetaCart
Abstract not found

Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation

by Emmanuel Vincent, Nancy Bertin, Roland Badeau , 2009
"... Multiple pitch estimation consists of estimating the fundamental frequencies and saliences of pitched sounds over short time frames of an audio signal. This task forms the basis of several applications in the particular context of musical audio. One approach is to decompose the short-term magnitude ..."
Abstract - Cited by 50 (9 self) - Add to MetaCart
Multiple pitch estimation consists of estimating the fundamental frequencies and saliences of pitched sounds over short time frames of an audio signal. This task forms the basis of several applications in the particular context of musical audio. One approach is to decompose the short-term magnitude spectrum of the signal into a sum of basis spectra representing individual pitches scaled by time-varying amplitudes, using algorithms such as nonnegative matrix factorization (NMF). Prior training of the basis spectra is often infeasible due to the wide range of possible musical instruments. Appropriate spectra must then be adaptively estimated from the data, which may result in limited performance due to overfitting issues. In this article, we model each basis spectrum as a weighted sum of narrowband spectra representing a few adjacent harmonic partials, thus enforcing harmonicity and spectral smoothness while adapting the spectral envelope to each instrument. We derive a NMFlike algorithm to estimate the model parameters and evaluate it on a database of piano recordings, considering several choices for the narrowband spectra. The proposed algorithm performs similarly to supervised NMF using pre-trained piano spectra but improves pitch estimation performance by 6 % to 10 % compared to alternative unsupervised NMF algorithms.

A general flexible framework for the handling of prior information in audio source separation

by Alexey Ozerov, Emmanuel Vincent, Senior Member, Frédéric Bimbot - IEEE Transactions on Audio, Speech and Signal Processing , 2012
"... Abstract—Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general audio source separation framework based on a library of str ..."
Abstract - Cited by 45 (17 self) - Add to MetaCart
Abstract—Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general audio source separation framework based on a library of structured source models that enable the incorporation of prior knowledge about each source via user-specifiable constraints. While this framework generalizes several existing audio source separation methods, it also allows to imagine and implement new efficient methods that were not yet reported in the literature. We first introduce the framework by describing the model structure and constraints, explaining its generality, and summarizing its algorithmic implementation using a generalized expectation-maximization algorithm. Finally, we illustrate the above-mentioned capabilities of the framework by applying it in several new and existing configurations to different source separation problems. We have released a software tool named Flexible Audio Source Separation Toolbox (FASST) implementing a baseline version of the framework in Matlab. Index Terms—Audio source separation, local Gaussian model, nonnegative matrix factorization, expectation-maximization I.
(Show Context)

Citation Context

...or specific rhythm patterns may also be accounted for in this way. Note that temporal models of the activation coefficients have been proposed in the state-of-the-art, using probabilistic priors [9], =-=[34]-=-, notespecific Gaussian-shaped time-localized patterns [42], or unstructured TF patterns [33]. Our proposition is complementary to [9], [34] in that it accounts for temporal behaviour in the model str...

Bayesian extensions to non-negative matrix factorisation for audio signal modelling

by Tuomas Virtanen, A. Taylan Cemgil, Simon Godsill - in ICASSP, 2008
"... We describe the underlying probabilistic generative signal model of non-negative matrix factorisation (NMF) and propose a realistic conjugate priors on the matrices to be estimated. A conjugate Gamma chain prior enables modelling the spectral smoothness of natural sounds in general, and other prior ..."
Abstract - Cited by 42 (6 self) - Add to MetaCart
We describe the underlying probabilistic generative signal model of non-negative matrix factorisation (NMF) and propose a realistic conjugate priors on the matrices to be estimated. A conjugate Gamma chain prior enables modelling the spectral smoothness of natural sounds in general, and other prior knowledge about the spectra of the sounds can be used without resorting to too restrictive techniques where some of the parameters are fixed. The resulting algorithm, while retaining the attractive features of standard NMF such as fast convergence and easy implementation, outperforms existing NMF strategies in a single channel audio source separation and detection task. Index Terms — acoustic signal processing, matrix decomposition, MAP estimation, source separation 1.
(Show Context)

Citation Context

...rs, one audio modelling approach has focused on non-negativity of the spectrogram matrix X = {xν,τ } and enforcing a factorisation as X = TV where both T and V are matrices with positive entries (see =-=[2, 3, 4]-=-, and references therein). Here, T can be interpreted as a codebook of spectra, called basis vectors, and V is the matrix of their gains in each frame. The success of the model stems from the fact tha...

MUSICAL INSTRUMENT RECOGNITION IN POLYPHONIC AUDIO USING SOURCE-FILTER MODEL FOR SOUND SEPARATION

by Toni Heittola, Anssi Klapuri, Tuomas Virtanen
"... This paper proposes a novel approach to musical instrument recognition in polyphonic audio signals by using a source-filter model and an augmented non-negative matrix factorization algorithm for sound separation. The mixture signal is decomposed into a sum of spectral bases modeled as a product of e ..."
Abstract - Cited by 40 (9 self) - Add to MetaCart
This paper proposes a novel approach to musical instrument recognition in polyphonic audio signals by using a source-filter model and an augmented non-negative matrix factorization algorithm for sound separation. The mixture signal is decomposed into a sum of spectral bases modeled as a product of excitations and filters. The excitations are restricted to harmonic spectra and their fundamental frequencies are estimated in advance using a multipitch estimator, whereas the filters are restricted to have smooth frequency responses by modeling them as a sum of elementary functions on the Mel-frequency scale. The pitch and timbre information are used in organizing individual notes into sound sources. In the recognition, Mel-frequency cepstral coefficients are used to represent the coarse shape of the power spectrum of sound sources and Gaussian mixture models are used to model instrument-conditional densities of the extracted features. The method is evaluated with polyphonic signals, randomly generated from 19 instrument classes. The recognition rate for signals having six note polyphony reaches 59%. 1.
(Show Context)

Citation Context

...ime-varying gain. The decomposition can be done, e.g., using independent component analysis (ICA) or non-negative matrix factorization (NMF), the latter usually leading to a better separation quality =-=[8]-=-. The advantage of the methods is their ability to learn the spectral characteristics of each source from a mixture, enabling separation of sources which overlap in time and frequency. Instrument reco...

Source/filter model for unsupervised main melody extraction from polyphonic audio signals

by Jean-louis Durrieu, Gaël Richard, Cédric Févotte - IEEE Trans. on Audio, Speech, and Language Processing , 2010
"... Abstract—Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this pape ..."
Abstract - Cited by 37 (8 self) - Add to MetaCart
Abstract—Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this paper, we propose a new approach for the estimation and extraction of the main melody (and in particular the leading vocal part) from polyphonic audio signals. To that aim, we propose a new signal model where the leading vocal part is explicitly represented by a specific source/filter model. The proposed representation is investigated in the framework of two statistical models: a Gaussian Scaled Mixture Model (GSMM) and an extended Instantaneous Mixture Model (IMM). For both models, the estimation of the different parameters is done within a maximumlikelihood framework adapted from single-channel source separation techniques. The desired sequence of fundamental frequencies is then inferred from the estimated parameters. The results obtained in a recent evaluation campaign (MIREX08) show that the proposed approaches are very promising and reach state-of-the-art performances on all test sets. Index Terms—Blind audio source separation, Expectation–Maximization (EM) algorithm, Gaussian scaled mixture model (GSMM), main melody extraction, maximum likelihood, music, non-negative matrix factorization (NMF), source/filter model, spectral analysis. I.
(Show Context)

Citation Context

...s. This equation allows us to find a convenient way of expressing the criterion (19) where and are both positive quantities. An appropriate direction of maximization is then found by setting to as in =-=[15]-=-. For each parameter in we derive the updating rules which we report in Algorithm 1. Algorithm 1 EM algorithm for the GSMM: Estimating for do where Furthermore, by definition of the expectation E step...

Sound Source Separation in Monaural Music Signals

by Tuomas Virtanen , 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract - Cited by 36 (4 self) - Add to MetaCart
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is

R.: Non-negative matrix factorization based compensation of music for automatic speech recognition

by Bhiksha Raj, Tuomas Virtanen, Sourish Chaudhuri, Rita Singh - In: Proc. of Interspeech. Makuhari , 2010
"... This paper proposes to use non-negative matrix factorization based speech enhancement in robust automatic recognition of mixtures of speech and music. We represent magnitude spectra of noisy speech signals as the non-negative weighted linear combination of speech and noise spectral basis vectors, th ..."
Abstract - Cited by 34 (12 self) - Add to MetaCart
This paper proposes to use non-negative matrix factorization based speech enhancement in robust automatic recognition of mixtures of speech and music. We represent magnitude spectra of noisy speech signals as the non-negative weighted linear combination of speech and noise spectral basis vectors, that are obtained from training corpora of speech and music. We use overcomplete dictionaries consisting of random exemplars of the training data. The method is tested on the Wall Street Journal large vocabulary speech corpus which is artificially corrupted with polyphonic music from the RWC music database. Various music styles and speech-tomusic ratios are evaluated. The proposed methods are shown to produce a consistent, significant improvement on the recognition performance in the comparison with the baseline method. Audio demonstrations of the enhanced signals are available at
(Show Context)

Citation Context

...w a different approach. In previous work we (and other researchers) have demonstrated that nonnegative spectral factorization methods, including those based on non-negative matrix factorization (NMF) =-=[9]-=- and latent-variable analysis (LVA) [10], can be effectively used for signal separation. These methods represent signals by a compositional model that characterizes their spectra as a weighted linear ...

Soft mask methods for single-channel speaker separation

by Aarthi M. Reddy, Bhiksha Raj - IEEE Trans. Audio, Speech, Lang. Process , 2007
"... Abstract—The problem of single-channel speaker separation at-tempts to extract a speech signal uttered by the speaker of interest from a signal containing a mixture of acoustic signals. Most algo-rithms that deal with this problem are based on masking, wherein unreliable frequency components from th ..."
Abstract - Cited by 31 (2 self) - Add to MetaCart
Abstract—The problem of single-channel speaker separation at-tempts to extract a speech signal uttered by the speaker of interest from a signal containing a mixture of acoustic signals. Most algo-rithms that deal with this problem are based on masking, wherein unreliable frequency components from the mixed signal spectro-gram are suppressed, and the reliable components are inverted to obtain the speech signal from speaker of interest. Most current techniques estimate this mask in a binary fashion, resulting in a hard mask. In this paper, we present two techniques to separate out the speech signal of the speaker of interest from a mixture of speech signals. One technique estimates all the spectral compo-nents of the desired speaker. The second technique estimates a soft mask that weights the frequency subbands of the mixed signal. In both cases, the speech signal of the speaker of interest is recon-structed from the complete spectral descriptions obtained. In their native form, these algorithms are computationally expensive. We also present fast factored approximations to the algorithms. Ex-periments reveal that the proposed algorithms can result in sig-nificant enhancement of individual speakers in mixed recordings, consistently achieving better performance than that obtained with hard binary masks. Index Terms—Signal separation, soft masks, speaker separation. I.
(Show Context)

Citation Context

...ach where basis functions extracted from training instances of the signals from the individual sources are used to identify and separate the component signals in mixtures. Smaragdis [24] and Virtanen =-=[25]-=- present algorithms based on nonnegative matrix factorization that attempt to decompose the power spectral vectors of mixed speech signals into linear combinations of nonnegative bases that have been ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University