Results 1 - 10
of
122
Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
- IEEE Trans. On Audio, Speech and Lang. Processing
, 2007
"... Abstract—An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain ..."
Abstract
-
Cited by 185 (30 self)
- Add to MetaCart
(Show Context)
Abstract—An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements. Index Terms—Acoustic signal analysis, audio source separation, blind source separation, music, nonnegative matrix factorization, sparse coding, unsupervised learning. I.
Convolutive speech bases and their application to supervised speech separation
- IEEE Transactions on Audio, Speech and Language Processing
, 2007
"... In this paper we present a convolutive basis decomposition method and its application on simultaneous speakers separation from monophonic recordings. The model we propose is a convolutive version of the non-negative matrix factorization algorithm. Due to the non-negativity constraint this type of co ..."
Abstract
-
Cited by 92 (6 self)
- Add to MetaCart
(Show Context)
In this paper we present a convolutive basis decomposition method and its application on simultaneous speakers separation from monophonic recordings. The model we propose is a convolutive version of the non-negative matrix factorization algorithm. Due to the non-negativity constraint this type of coding is very well suited for intuitively and efficiently representing magnitude spectra. We present results that reveal the nature of these basis functions and we introduce their utility in separating monophonic mixtures of known speakers.
Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine
- In: Proc. EUSIPCO’2005. (2005
, 2005
"... This paper presents a procedure for the separation of pitched musical instruments and drums from polyphonic music. The method is based on two-stage processing in which the input signal is first separated into elementary time-frequency components which are then organized into sound sources. Non-negat ..."
Abstract
-
Cited by 53 (4 self)
- Add to MetaCart
(Show Context)
This paper presents a procedure for the separation of pitched musical instruments and drums from polyphonic music. The method is based on two-stage processing in which the input signal is first separated into elementary time-frequency components which are then organized into sound sources. Non-negative matrix factorization (NMF) is used to separate the input spectrogram into components having a fixed spectrum with time-varying gain. Each component is classified either to pitched instruments or to drums using a support vector machine (SVM). The classifier is trained using example signals from both classes. Simulation experiments were carried out using mixtures generated from real-world polyphonic music signals. The results indicate that the proposed method enables better separation quality than existing methods based on sinusoidal modeling and onset detection. Demonstration signals are available at
Extraction of drum tracks from polyphonic music using Independent Subspace Analysis
- In Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003
, 2003
"... The analysis and separation of audio signals into their original components is an important prerequisite to automatic transcription of music, extraction of metadata from audio data, and speaker separation in video conferencing. In this paper, a method for the separation of drum tracks from polyphoni ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
(Show Context)
The analysis and separation of audio signals into their original components is an important prerequisite to automatic transcription of music, extraction of metadata from audio data, and speaker separation in video conferencing. In this paper, a method for the separation of drum tracks from polyphonic music is proposed. It consists of an Independent Component Analysis and a subsequent partitioning of the derived components into subspaces containing the percussive and harmonic sustained instruments. With the proposed method, different samples of popular music have been analyzed. The results show sufficient separation of drum tracks and non-drum tracks for subsequent metadata extraction. Informal listening tests prove a moderate audio quality of the resulting audio signals. 1.1. Motivation 1.
Supervised and Semi-Supervised Separation of Sounds from Single-Channel Mixtures
"... Abstract. In this paper we describe a methodology for model-based single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/frequency energy. This model can then be used to extract known types of sounds from mixtures in t ..."
Abstract
-
Cited by 46 (10 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper we describe a methodology for model-based single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/frequency energy. This model can then be used to extract known types of sounds from mixtures in two scenarios. One being the case where all sound types in the mixture are known, and the other being being the case where only the target or the interference models are known. The model we propose has close ties to non-negative decompositions and latent variable models commonly used for semantic analysis. 1
Separation of sound sources by convolutive sparse coding
- in Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, 2004. [Online] Available: http://journal.speech.cs.cmu.edu/SAPA2004
, 2004
"... An algorithm for the separation of sound sources is presented. Each source is parametrized as a convolution between a time-frequency magnitude spectrogam and an onset vector. The source model is able to represent several types of sounds, for example repetitive drum sounds and harmonic sounds with mo ..."
Abstract
-
Cited by 46 (6 self)
- Add to MetaCart
(Show Context)
An algorithm for the separation of sound sources is presented. Each source is parametrized as a convolution between a time-frequency magnitude spectrogam and an onset vector. The source model is able to represent several types of sounds, for example repetitive drum sounds and harmonic sounds with modulations. An iterative algorithm is proposed for the estimation the parameters. The algorithm is based on minimizing the reconstruction error and the number of onsets. The number of onsets is minimized by applying the sparse coding scheme for onset vectors. A way of modeling the loudness perception of the human auditory system is proposed. The method compresses high-energy sources, and enables the separation of lowenergy sources which are perceptually significant. The algorithm is able to separate meaningful sources from real-world signals. Simulation experiments were carried out using mixtures of harmonic instruments. Demonstration signals are available at
Unsupervised analysis of polyphonic music by sparse coding
- IEEE Transactions on Neural Networks
, 2006
"... We investigate a data-driven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of short-term Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small nu ..."
Abstract
-
Cited by 43 (4 self)
- Add to MetaCart
(Show Context)
We investigate a data-driven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of short-term Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small number of “atomic ” spectra chosen from a larger dictionary; this dictionary is, in turn, learned from the data in such a way as to represent the given training set in an (information theoretically) efficient way. When exposed to examples of polyphonic music, most of the dictionary elements take on the spectral characteristics of individual notes in the music, so that the sparse decomposition can be used to identify the notes in a polyphonic mixture. Our approach differs from other methods of polyphonic analysis based on spectral decomposition by combining all of the following: a) a formulation in terms of an explicitly given probabilistic model, in which the process estimating which notes are present corresponds naturally with the inference of latent variables in the model; b) a particularly simple generative model, motivated by very general considerations about efficient coding, that makes very few assumptions about the musical origins of the signals being processed; and c) the ability to learn a dictionary of atomic spectra (most of which converge to harmonic spectral profiles associated with specific notes) from polyphonic examples alone—no separate training on monophonic examples is required. Index Terms Learning overcomplete dictionaries, polyphonic music, probabilistic modeling, redundancy reduction, sparse factorial coding, unsupervised learning. ©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any
Transcription and separation of drum signals from polyphonic music
- IEEE Trans. on Audio, Speech and Language Processing
, 2008
"... Abstract—The purpose of this article is to present new advances in music transcription and source separation with a focus on drum signals. A complete drum transcription system is described, which combines information from the original music signal and a drum track enhanced version obtained by source ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
(Show Context)
Abstract—The purpose of this article is to present new advances in music transcription and source separation with a focus on drum signals. A complete drum transcription system is described, which combines information from the original music signal and a drum track enhanced version obtained by source separation. In addition to efficient fusion strategies to take into account these two complementary sources of information, the transcription system integrates a large set of features, optimally selected by feature selection. Concurrently, the problem of drum track extraction from polyphonic music is tackled both by proposing a novel approach based on harmonic/noise decomposition and time/frequency masking and by improving an existing Wiener filtering-based separation method. The separation and transcription techniques presented are thoroughly evaluated on a large public database of music signals. A transcription accuracy between 64.5 % and 80.3% is obtained, depending on the drum instrument, for well-balanced mixes, and the efficiency of our drum separation algorithms is illustrated in a comprehensive benchmark. Index Terms—Drum signals, feature selection, harmonic/noise decomposition, music transcription, source separation, support vector machine (SVM), Wiener filtering. I.
Sub-band independent subspace analysis for drum transcription
- in Proceedings of the International Conference on Digital Audio Effects (DAFx), 2002
"... ..."