Results 1 -
5 of
5
Noncommercial-Share Alike 3.0 License Recommended Citation
, 2008
"... This Conference Paper is brought to you for free and open access by the ..."
(Show Context)
Extended Non-negative Tensor Factorisation models for Musical Sound Source Separation
"... Recently, shift invariant tensor factorisation algorithms have been proposed for the purposes of sound source separation of pitched musical instruments. However, existing algorithms require the use of log-frequency spectrograms to allow shift invariance in frequency which causes problems when attemp ..."
Abstract
- Add to MetaCart
(Show Context)
Recently, shift invariant tensor factorisation algorithms have been proposed for the purposes of sound source separation of pitched musical instruments. However, existing algorithms require the use of log-frequency spectrograms to allow shift invariance in frequency which causes problems when attempting to resynthesise the separated sources. Further, it is difficult to impose harmonicity constraints on the recovered basis functions. This paper proposes a new additive synthesis-based approach which allows the use of linear-frequency spectrograms as well as imposing strict harmonic constraints, resulting in an improved model. Further, these additional constraints allow the addition of a source filter model to the factorisation framework, and an extended model which is capable of separating mixtures of pitched and percussive instruments simultaneously. 1
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription
"... Abstract—Recently, new methods for smart decomposition of time-frequency representations of audio have been proposed in order to address the problem of automatic music transcription. However those techniques are not necessarily suitable for notes having variations of both pitch and spectral envelope ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Recently, new methods for smart decomposition of time-frequency representations of audio have been proposed in order to address the problem of automatic music transcription. However those techniques are not necessarily suitable for notes having variations of both pitch and spectral envelope over time. The HALCA (Harmonic Adaptive Latent Component Analysis) model presented in this article allows considering those two kinds of variations simultaneously. Each note in a constant-Q transform is locally modeled as a weighted sum of fixed narrowband harmonic spectra, spectrally convolved with some impulse that defines the pitch. All parameters are estimated by means of the expectation-maximization (EM) algorithm, in the framework of Probabilistic Latent Component Analysis. Interesting priors over the parameters are also introduced in order to help the EM algorithm converging towards a meaningful solution. We applied this model for automatic music transcription: the onset time, duration and pitch of each note in an audio file are inferred from the estimated parameters. The system has been evaluated on two different databases and obtains very promising results. Index Terms—Automatic transcription, multipitch estimation, probabilistic latent component analysis, nonnegative matrix factorization. I.
Semi-Supervised Non-Negative Tensor Factorisation of Modulation Spectrograms for Monaural Speech Separation
"... Abstract—This paper details the use of a semi-supervised approach to audio source separation. Where only a single source model is available, the model for an unknown source must be estimated. A mixture signal is separated through factorisation of a feature-tensor representation, based on the modulat ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—This paper details the use of a semi-supervised approach to audio source separation. Where only a single source model is available, the model for an unknown source must be estimated. A mixture signal is separated through factorisation of a feature-tensor representation, based on the modulation spec-trogram. Harmonically related components tend to modulate in a similar fashion, and this redundancy of patterns can be isolated. This feature representation requires fewer parameters than spectrally based methods and so minimises overfitting. Following the tensor factorisation, the separated signals are reconstructed by learning appropriate Wiener-filter spectral pa-rameters which have been constrained by activation parameters learned in the first stage. Strong results were obtained for two-speaker mixtures where source separation performance exceeded those used as benchmarks. Specifically, the proposed semi-supervised method outperformed both semi-supervised non-negative matrix fac-torisation and blind non-negative modulation spectrum tensor factorisation. I.
Blind Source Separation and Automatic Transcription of Music Using Tensor Decompositions
, 2007
"... Part of the Signal Processing Commons This Conference Paper is brought to you for free and open access by the Audio Research Group at ..."
Abstract
- Add to MetaCart
(Show Context)
Part of the Signal Processing Commons This Conference Paper is brought to you for free and open access by the Audio Research Group at