Results 1 - 10
of
10
Multichannel extensions of non-negative matrix factorization with complex-valued data
- IEEE Transactions on Audio, Speech and Language Processing
, 2013
"... Abstract—This paper presents new formulations and algorithms for multichannel extensions of non-negative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Multichannel Euclidean distance and mult ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
Abstract—This paper presents new formulations and algorithms for multichannel extensions of non-negative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Multichannel Euclidean distance and multichannel Itakura-Saito (IS) divergence are defined based on appropriate statistical models utilizing multivariate complex Gaussian distri-butions. To minimize this distance/divergence, efficient optimiza-tion algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Two methods are proposed for clustering NMF bases according to the estimated spatial property. Convolutive blind source separation (BSS) is performed by the multichannel extensions of NMF with the clus-tering mechanism. Experimental results show that 1) the derived multiplicative update rules exhibited good convergence behavior, and 2) BSS tasks for several music sources with two microphones and three instrumental parts were evaluated successfully. Index Terms—Blind source separation, clustering, convolutive mixture, multichannel, non-negative matrix factorization. I.
Explicit beat structure modeling for non-negative matrix factorization-based multipitch analysis
- in Proc. of ICASSP, Kyoto
, 2012
"... This paper proposes model-based non-negative matrix factorization (NMF) for estimating basis spectra and activations, detecting note onsets and offsets, and determining beat locations, simultaneously. Multipitch analysis is a process of detecting the pitch and onset of each note from a musical signa ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper proposes model-based non-negative matrix factorization (NMF) for estimating basis spectra and activations, detecting note onsets and offsets, and determining beat locations, simultaneously. Multipitch analysis is a process of detecting the pitch and onset of each note from a musical signal. Conventional NMF-based approaches often lead to unsatisfactory results very possibly due to the lack of musically meaningful constraints. As music is highly structured in terms of the temporal regularity underlying the onset occurrences of notes, we use this rhythmic structure to constrain NMF by parametrically modeling each note activation with a Gaussian mixture and derive an algorithm for iteratively updating model parameters. It is experimentally shown that the proposed model outperforms the standard NMF algorithms as regards onset detection rate. Index Terms — Polyphonic pitch transcription, Non-negative matrix factorization, Rhythmic/Beat structure, Onset detection
Automatic Transcription of Polyphonic Music Exploiting Temporal Evolution
, 2012
"... Automatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous ap-plications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcrib ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Automatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous ap-plications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing poly-phonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains
Non-negative multiple matrix factorization
- in IJCAI
, 2013
"... Non-negative Matrix Factorization (NMF) is a tra-ditional unsupervised machine learning technique for decomposing a matrix into a set of bases and co-efficients under the non-negative constraint. NMF with sparse constraints is also known for extracting reasonable components from noisy data. However, ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Non-negative Matrix Factorization (NMF) is a tra-ditional unsupervised machine learning technique for decomposing a matrix into a set of bases and co-efficients under the non-negative constraint. NMF with sparse constraints is also known for extracting reasonable components from noisy data. However, NMF tends to give undesired results in the case of highly sparse data, because the information in-cluded in the data is insufficient to decompose. Our key idea is that we can ease this problem if comple-mentary data are available that we could integrate into the estimation of the bases and coefficients. In this paper, we propose a novel matrix factoriza-tion method called Non-negative Multiple Matrix Factorization (NM2F), which utilizes complemen-tary data as auxiliary matrices that share the row or column indices of the target matrix. The data sparseness is improved by decomposing the target and auxiliary matrices simultaneously, since auxil-iary matrices provide information about the bases and coefficients. We formulate NM2F as a gen-eralization of NMF, and then present a parameter estimation procedure derived from the multiplica-tive update rule. We examined NM2F in both syn-thetic and real data experiments. The effect of the auxiliary matrices appeared in the improved NM2F performance. We also confirmed that the bases that NM2F obtained from the real data were intu-itive and reasonable thanks to the non-negative con-straint. 1
Unsupervised music understanding based on nonparametric Bayesian models
- In IEEE International Conference on Audio, Speech and Signal Processing
, 2012
"... This paper presents a new research framework for unsupervised mu-sic understanding. Our goal is to recognize musical notes from poly-phonic audio signals and simultaneously induce grammatical pat-terns from the recognized notes by integrating probabilistic acoustic and language models. Given music a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
This paper presents a new research framework for unsupervised mu-sic understanding. Our goal is to recognize musical notes from poly-phonic audio signals and simultaneously induce grammatical pat-terns from the recognized notes by integrating probabilistic acoustic and language models. Given music audio signals, both models could be jointly trained in a self-organizingmanner without manually spec-ifying the numbers of musical notes and grammatical patterns. In this paper, we introduce our nonparametric Bayesian acoustic and language models for multipitch analysis and chord progression anal-ysis and discuss issues for integrating these models. We then provide a novel overview of various acoustic and language models whose un-derlying concepts are useful for implementing the framework. Index Terms — Unsupervised music understanding, Bayesian nonparametrics, statistical machine learning, acoustic and language
Structured Stochastic Variational Inference
"... Abstract Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly using stochastic optimization. The algorithm relies on the use of fully factorized variational distributions. However, this "mean-field" independence approx ..."
Abstract
- Add to MetaCart
Abstract Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly using stochastic optimization. The algorithm relies on the use of fully factorized variational distributions. However, this "mean-field" independence approximation limits the fidelity of the posterior approximation, and introduces local optima. We show how to relax the mean-field approximation to allow arbitrary dependencies between global parameters and local hidden variables, producing better parameter estimates by reducing bias, sensitivity to local optima, and sensitivity to hyperparameters.
Bayesian Approaches to Acoustic Modeling: A Review
, 2012
"... This paper focuses on applications of Bayesian approaches to acoustic modeling for speech recognition and related speech processing applications. Bayesian approaches have been widely studied in the fields of statistics and machine learning, and one of their advantages is that their generalization ca ..."
Abstract
- Add to MetaCart
(Show Context)
This paper focuses on applications of Bayesian approaches to acoustic modeling for speech recognition and related speech processing applications. Bayesian approaches have been widely studied in the fields of statistics and machine learning, and one of their advantages is that their generalization capability is better than that of conventional approaches (e.g., maximum likelihood). On the other hand, since inference in Bayesian approaches involves integrals and expectations that are mathematically intractable in most cases and require heavy numerical computations, it is generally difficult to apply them to practical speech recognition problems. However, there have been many such attempts, and this paper aims to summarize these attempts to encourage further progress on Bayesian approaches in the speech processing field. This paper describes various applications of Bayesian approaches to speech processing in terms of the four typical ways of approximating Bayesian inferences, i.e., maximum a posteriori approximation, model complexity control using a Bayesian information criterion based on asymptotic approximation, variational approximation, and Markov chain Monte Carlo based sampling techniques.
Model-Based Multiple Pitch Tracking Using Factorial HMMs: Model Adaptation and Inference
"... Abstract—Robustness against noise and interfering audio signals is one of the challenges in speech recognition and audio analysis technology. One avenue to approach this challenge is single-channel multiple-source modeling. Factorial hidden Markov models (FHMMs) are capable of modeling acoustic scen ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Robustness against noise and interfering audio signals is one of the challenges in speech recognition and audio analysis technology. One avenue to approach this challenge is single-channel multiple-source modeling. Factorial hidden Markov models (FHMMs) are capable of modeling acoustic scenes with multiple sources interacting over time. While these models reach good performance on specific tasks, there are still serious limitations restricting the applicability in many domains. In this paper, we generalize these models and enhance their applicability. In particular, we develop an EM-like iterative adaptation framework which is capable to adapt the model parameters to the specific situation (e.g. actual speakers, gain, acoustic channel, etc.) using only speech mixture data. Currently, source-specific datais required to learn the model. Inference inFHMMsisanessential ingredient for adaptation. We develop efficient approaches based on observation likelihood pruning. Both adaptation and efficient inference are empirically evaluated for the task of multipitch tracking using the GRID corpus. Index Terms—Efficient inference, factorial hidden Markov model, Gaussian mixture model, mixture maximization, model adaptation, multipitch tracking, self-adaptation. I.
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Non-Negative Multiple Matrix Factorization
"... Non-negative Matrix Factorization (NMF) is a traditional unsupervised machine learning technique for decomposing a matrix into a set of bases and coefficients under the non-negative constraint. NMF with sparse constraints is also known for extracting reasonable components from noisy data. However, N ..."
Abstract
- Add to MetaCart
Non-negative Matrix Factorization (NMF) is a traditional unsupervised machine learning technique for decomposing a matrix into a set of bases and coefficients under the non-negative constraint. NMF with sparse constraints is also known for extracting reasonable components from noisy data. However, NMF tends to give undesired results in the case of highly sparse data, because the information included in the data is insufficient to decompose. Our key idea is that we can ease this problem if complementary data are available that we could integrate into the estimation of the bases and coefficients. In this paper, we propose a novel matrix factorization method called Non-negative Multiple Matrix Factorization (NMMF), which utilizes complementary data as auxiliary matrices that share the row or column indices of the target matrix. The data sparseness is improved by decomposing the target and auxiliary matrices simultaneously, since auxiliary matrices provide information about the bases and coefficients. We formulate NMMF as a generalization of NMF, and then present a parameter estimation procedure derived from the multiplicative update rule. We examined NMMF in both synthetic and real data experiments. The effect of the auxiliary matrices appeared in the improved NMMF performance. We also confirmed that the bases that NMMF obtained from the real data were intuitive and reasonable thanks to the non-negative constraint. 1
1Multichannel Extensions of Non-negative Matrix Factorization with Complex-valued Data
"... Abstract — This paper presents new formulations and algo-rithms for multichannel extensions of non-negative matrix fac-torization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Multichannel Euclidean distance and ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — This paper presents new formulations and algo-rithms for multichannel extensions of non-negative matrix fac-torization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Multichannel Euclidean distance and multichannel Itakura-Saito (IS) divergence are defined based on appropriate statistical models utilizing multivariate complex Gaussian distributions. To minimize this distance/divergence, efficient optimization algorithms in the form of multiplicative up-dates are derived by using properly designed auxiliary functions. Two methods are proposed for clustering NMF bases according to the estimated spatial property. Convolutive blind source separation (BSS) is performed by the multichannel extensions of NMF with the clustering mechanism. Experimental results show that 1) the derived multiplicative update rules exhibited good convergence behavior, and 2) BSS tasks for several music sources with two microphones and three instrumental parts were evaluated successfully. Index Terms — Non-negative matrix factorization, multichan-nel, blind source separation, convolutive mixture, clustering