Results 1  10
of
36
Analysis of polyphonic audio using sourcefilter model and nonnegative matrix factorization
 in Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop
, 2006
"... •Framework for (polyphonic) audio — linear signal model for magnitude spectrum xt(k): x̂t(k) = N∑ ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
•Framework for (polyphonic) audio — linear signal model for magnitude spectrum xt(k): x̂t(k) = N∑
Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation
"... Abstract—Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time–frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modelingbased separation system that can effecti ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time–frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modelingbased separation system that can effectively resolve overlapping harmonics. Our strategy is based on the observations that harmonics of the same source have correlated amplitude envelopes and that the change in phase of a harmonic is related to the instrument’s pitch. We use these two observations in a least squares estimation framework for separation of overlapping harmonics. The system directly distributes mixture energy for harmonics that are unobstructed by other sources. Quantitative evaluation of the proposed system is shown when ground truth pitch information is available, when rough pitch estimates are provided in the form of a MIDI score, and finally, when a multipitch tracking algorithm is used. We also introduce a technique to improve the accuracy of rough pitch estimates. Results show that the proposed system significantly outperforms related monaural musical sound separation systems. Index Terms—Common amplitude modulation (CAM), musical sound separation, sinusoidal modeling, time–frequency masking, underdetermined sound separation. I.
Nonnegative matrix deconvolution in noise robust speech recognition
 in ICASSP 2011
"... High noise robustness has been achieved in speech recognition by using sparse exemplarbased methods with spectrogram windows spanning up to 300 ms. A downside is that a large exemplar dictionary is required to cover sufficiently many spectral patterns and their temporal alignments within windows. ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
(Show Context)
High noise robustness has been achieved in speech recognition by using sparse exemplarbased methods with spectrogram windows spanning up to 300 ms. A downside is that a large exemplar dictionary is required to cover sufficiently many spectral patterns and their temporal alignments within windows. We propose a recognition system based on a shiftinvariant convolutive model, where exemplar activations at all the possible temporal positions jointly reconstruct an utterance. Recognition rates are evaluated using the AURORA2 database, containing spoken digits with noise ranging from clean speech to5 dB SNR. We obtain results superior to those, where the activations were found independently for each overlapping window. Index Terms — Automatic speech recognition, noise robustness, deconvolution, sparsity, exemplarbased
MUSICAL SOUND SEPARATION USING PITCHBASED LABELING AND BINARY TIMEFREQUENCY MASKING
"... Monaural musical sound separation attempts to segregate different instrument lines from singlechannel polyphonic music. We propose a system that decomposes an input into timefrequency units using an auditory filterbank and utilizes pitch to label which instrument line each timefrequency unit is as ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Monaural musical sound separation attempts to segregate different instrument lines from singlechannel polyphonic music. We propose a system that decomposes an input into timefrequency units using an auditory filterbank and utilizes pitch to label which instrument line each timefrequency unit is assigned to. The system is conceptually simple and computationally efficient. Systematic evaluation shows that, despite its simplicity, the proposed system achieves a competitive level of performance. Index Terms — musical sound separation, computational auditory scene analysis, pitchbased labeling 1.
SPECTRAL COVARIANCE IN PRIOR DISTRIBUTIONS OF NONNEGATIVE MATRIX FACTORIZATION BASED SPEECH SEPARATION
"... This paper proposes an algorithm for modeling the covariance of the spectrum in the prior distributions of nonnegative matrix factorization (NMF) based sound source separation. Supervised NMF estimates a set of spectrum basis vectors for each source, and then represents a mixture signal using them. ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper proposes an algorithm for modeling the covariance of the spectrum in the prior distributions of nonnegative matrix factorization (NMF) based sound source separation. Supervised NMF estimates a set of spectrum basis vectors for each source, and then represents a mixture signal using them. When the exact characteristics of the sources are not known in advance, it is advantageous to train prior distributions of spectra instead of fixed spectra. Since the frequency bands in natural sound sources are strongly correlated, we model the distributions with fullcovariance Gaussian distributions. Algorithms for training and applying the distributions are presented. The proposed methods produce better separation quality that the reference methods. Demonstration signals are available at www.cs.tut.fi/~tuomasv. 1.
PROBABILISTIC LATENT TENSOR FACTORIZATION FRAMEWORK FOR AUDIO MODELING
"... This paper introduces probabilistic latent tensor factorization (PLTF) as a general framework for hierarchical modeling of audio. This framework combines practical aspects of graphical modeling of machine learning with tensor factorization models. Once a model is constructed in the PLTF framework, t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
This paper introduces probabilistic latent tensor factorization (PLTF) as a general framework for hierarchical modeling of audio. This framework combines practical aspects of graphical modeling of machine learning with tensor factorization models. Once a model is constructed in the PLTF framework, the estimation algorithm is immediately available. We illustrate our approach using several popular models such as NMF or NMF2D and provide extensions with simulation results on real data for key audio processing tasks such as restoration and source separation.
On the use of masking filters in sound source separation
 in Proc. of 15th International Conference on Digital Audio Effects
, 2012
"... This Conference Paper is brought to you for free and open access by the ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
This Conference Paper is brought to you for free and open access by the
Automatic Transcription of Pitch Content in Music and Selected Applications
"... Transcription of music refers to the analysis of a music signal in order to produce a parametric representation of the sounding notes in the signal. This is conventionally carried out by listening to a piece of music and writing down the symbols of common musical notation to represent the occurring ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Transcription of music refers to the analysis of a music signal in order to produce a parametric representation of the sounding notes in the signal. This is conventionally carried out by listening to a piece of music and writing down the symbols of common musical notation to represent the occurring notes in the piece. Automatic transcription of music refers to the extraction of such representations using signalprocessing methods. This thesis concerns the automatic transcription of pitched notes in musical audio and its applications. Emphasis is laid on the transcription of realistic polyphonic music, where multiple pitched and percussive instruments are sounding simultaneously. The methods included in this thesis are based on a framework which combines both lowlevel acoustic modeling and highlevel musicological modeling. The emphasis in the acoustic modeling has been set to note events so that the methods produce discretepitch notes with onset times and durations
Bayesian Statistical Methods for Audio and Music Processing
, 2008
"... Bayesian statistical methods provide a formalism for arriving at solutions to various problems faced in audio processing. In real environments, acoustical conditions and sound sources are highly variable, yet audio signals often possess significant statistical structure. There is a great deal of pri ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Bayesian statistical methods provide a formalism for arriving at solutions to various problems faced in audio processing. In real environments, acoustical conditions and sound sources are highly variable, yet audio signals often possess significant statistical structure. There is a great deal of prior knowledge available about why this statistical structure is present. This includes knowledge of the physical mechanisms by which sounds are generated, the cognitive processes by which sounds are perceived and, in the context of music, the abstract mechanisms by which highlevel sound structure is compiled. Bayesian hierarchical techniques provide a natural means for unification of these bodies of prior knowledge, allowing the formulation of highlystructured models for observed audio data and latent processes at various levels of abstraction. They also permit the inclusion of desirable modelling components such as changepoint structures and modelorder specifications. The resulting models exhibit complex statistical structure and in practice, highly adaptive and powerful computational techniques are needed to perform inference. In this chapter, we review some of the statistical models and associated inference methods developed recently for
Nonnegative sourcefilter dynamical system for speech enhancement
, 2014
"... Modelbased speech enhancement methods, which rely on separately modeling the speech and the noise, have been shown to be powerful in many different problem settings. When the structure of the noise can be arbitrary, which is often the case in practice, model based methods have to focus on developi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Modelbased speech enhancement methods, which rely on separately modeling the speech and the noise, have been shown to be powerful in many different problem settings. When the structure of the noise can be arbitrary, which is often the case in practice, model based methods have to focus on developing good speech models, whose quality will be key to their performance. In this study, we propose a novel probabilistic model for speech enhancement which precisely models the speech by taking into account the underlying speech production process as well as its dynamics. The proposed model follows a sourcefilter approach where the excitation and filter parts are modeled as nonnegative dynamical systems. We present convergenceguaranteed update rules for each latent factor. In order to assess performance, we evaluate our model on a challenging speech enhancement task where the speech is observed under nonstationary noises recorded in a car. We show that our model outperforms stateoftheart methods in terms of objective measures.