Results 1  10
of
47
Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
 IEEE Trans. On Audio, Speech and Lang. Processing
, 2007
"... Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain ..."
Abstract

Cited by 185 (30 self)
 Add to MetaCart
(Show Context)
Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements. Index Terms—Acoustic signal analysis, audio source separation, blind source separation, music, nonnegative matrix factorization, sparse coding, unsupervised learning. I.
Dictionaries for Sparse Representation Modeling
"... Sparse and redundant representation modeling of data assumes an ability to describe signals as linear combinations of a few atoms from a prespecified dictionary. As such, the choice of the dictionary that sparsifies the signals is crucial for the success of this model. In general, the choice of a p ..."
Abstract

Cited by 108 (3 self)
 Add to MetaCart
Sparse and redundant representation modeling of data assumes an ability to describe signals as linear combinations of a few atoms from a prespecified dictionary. As such, the choice of the dictionary that sparsifies the signals is crucial for the success of this model. In general, the choice of a proper dictionary can be done using one of two ways: (i) building a sparsifying dictionary based on a mathematical model of the data, or (ii) learning a dictionary to perform best on a training set. In this paper we describe the evolution of these two paradigms. As manifestations of the first approach, we cover topics such as wavelets, wavelet packets, contourlets, and curvelets, all aiming to exploit 1D and 2D mathematical models for constructing effective dictionaries for signals and images. Dictionary learning takes a different route, attaching the dictionary to a set of examples it is supposed to serve. From the seminal work of Field and Olshausen, through the MOD, the KSVD, the Generalized PCA and others, this paper surveys the various options such training has to offer, up to the most recent contributions and structures.
Normalised iterative hard thresholding; guaranteed stability and performance
, 2009
"... and performance ..."
(Show Context)
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, onechannel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of realworld sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses modelbased inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
Shiftinvariant dictionary learning for sparse representations: extending KSVD
 in Proc. EUSIPCO
, 2008
"... Shiftinvariant dictionaries are generated by taking all the possible shifts of a few short patterns. They are helpful to represent long signals where the same pattern can appear several times at different positions. We present an algorithm that learns shift invariant dictionaries from long training ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
(Show Context)
Shiftinvariant dictionaries are generated by taking all the possible shifts of a few short patterns. They are helpful to represent long signals where the same pattern can appear several times at different positions. We present an algorithm that learns shift invariant dictionaries from long training signals. This algorithm is an extension of KSVD. It alternates a sparse decomposition step and a dictionary update step. The update is more difficult in the shiftinvariant case because of occurrences of the same pattern that overlap. We propose and evaluate an unbiased extension of the method used in KSVD, i.e. a method able to exactly retrieve the original dictionary in a noiseless case. 1.
Sparse and redundant modeling of image content using an imagesignaturedictionary
, 2007
"... Modeling signals by a sparse and redundant representations is drawing a considerable attention in recent years. Coupled with the ability to train the dictionary using signal examples, these techniques have been shown to lead to stateoftheart results in a series of recent applications. In this pa ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
Modeling signals by a sparse and redundant representations is drawing a considerable attention in recent years. Coupled with the ability to train the dictionary using signal examples, these techniques have been shown to lead to stateoftheart results in a series of recent applications. In this paper we propose a novel structure of such a model for representing image content. The new dictionary is itself a small image, such that every patch in it (in varying location and size) is a possible atom in the representation. We refer to this as the ImageSignatureDictionary (ISD), and show how it can be trained from image examples. This novel structure enjoys several important features, such as shift and scale flexibilities, and smaller memory and computational requirements, compared to the classical dictionary approach. As a demonstration of these benefits, we present highquality image denoising results based on this new model.
Probabilistic modeling paradigms for audio source separation
 In Machine Audition: Principles, Algorithms and Systems. IGI Global
, 2010
"... Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation system ..."
Abstract

Cited by 25 (14 self)
 Add to MetaCart
(Show Context)
Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, we focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local timefrequency models and spectral templatebased models. We show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. We compare the merits of either paradigm and report objective performance figures. We conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future stateoftheart systems.
Specmurt analysis of polyphonic music signals
 IEEE Transactions on Audio, Speech, and Language Processing
"... Abstract—This paper introduces a new music signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of logscaled power spectrum with linear frequency, specmurt is defined as the inverse F ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
(Show Context)
Abstract—This paper introduces a new music signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of logscaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with logscaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the logfrequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasioptimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a pianorolllike display from a polyphonic music signal and automatic soundtoMIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic music signals and compared with manually annotated MIDI data. Index Terms—Inverse filtering, iteration algorithm, multipitch analysis, pitch visualization, polyphonic music signals. I.
Learning shiftinvariant sparse representation of actions
 IEEE Conference on Computer Vision and Pattern Recognition (2010) 2630
"... A central problem in the analysis of motion capture (MoCap) data is how to decompose motion sequences into primitives. Ideally, a description in terms of primitives should facilitate the recognition, synthesis, and characterization of actions. We propose an unsupervised learning algorithm for auto ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
(Show Context)
A central problem in the analysis of motion capture (MoCap) data is how to decompose motion sequences into primitives. Ideally, a description in terms of primitives should facilitate the recognition, synthesis, and characterization of actions. We propose an unsupervised learning algorithm for automatically decomposing joint movements in human motion capture (MoCap) sequences into shiftinvariant basis functions. Our formulation models the time series data of joint movements in actions as a sparse linear combination of short basis functions (snippets), which are executed (or “activated”) at different positions in time. Given a set of MoCap sequences of different actions, our algorithm finds the decomposition of MoCap sequences in terms of basis functions and their activations in time. Using the tools of L1 minimization, the procedure alternately solves two large convex minimizations: Given the basis functions, a variant of Orthogonal Matching Pursuit solves for the activations, and given the activations, the Split Bregman Algorithm solves for the basis functions. Experiments demonstrate the power of the decomposition in a number of applications, including action recognition, retrieval, MoCap data compression, and as a tool for classification in the diagnosis of Parkinson (a motion disorder disease). 1.
Simultaneous codeword optimization (SimCO) for dictionary update and learning
 IEEE Trans. Signal Process
, 2012
"... Abstract—We consider the datadriven dictionary learning problem. The goal is to seek an overcomplete dictionary from which every training signal can be best approximated by a linear combination of only a few codewords. This task is often achieved by iteratively executing two operations: sparse cod ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
Abstract—We consider the datadriven dictionary learning problem. The goal is to seek an overcomplete dictionary from which every training signal can be best approximated by a linear combination of only a few codewords. This task is often achieved by iteratively executing two operations: sparse coding and dictionary update. In the literature, there are two benchmark mechanisms to update a dictionary. The first approach, for example the MOD algorithm, is characterized by searching for the optimal codewords while fixing the sparse coefficients. In the second approach, represented by the KSVD method, one codeword and the related sparse coefficients are simultaneously updated while all other codewords and coefficients remain unchanged. We propose a novel framework that generalizes the aforementioned two methods. The unique feature of our approach is that one can update an arbitrary set of codewords and the corresponding sparse coefficients simultaneously: when sparse coefficients are fixed, the underlying optimization problem is the same as that in the MOD algorithm; when only one codeword is selected for update, it can be proved that the proposed algorithm is equivalent to the KSVD method; and more importantly, our method allows to update all codewords and all sparse coefficients simultaneously, hence the term simultaneously codeword optimization (SimCO). Under the proposed framework, we design two algorithms, namely the primitive and regularized SimCO. Simulations demonstrate that our approach excels the benchmark KSVD in terms of both learning performance and running speed. I.