Results 1 - 10
of
22
Switching Linear Dynamical Systems for Noise Robust Speech Recognition
- IEEE Trans. Audio, Speech and Language Processing
, 2007
"... to appear in ..."
Bayesian Analysis of Polyphonic Western Tonal Music
- Journal of the Acoustical Society of America
, 2006
"... This paper deals with the computational analysis of musical audio from recorded audio waveforms. This general problem includes, as sub-tasks, music transcription, extraction of musical pitch, dynamics, timbre, instrument identity, and source separation. Analysis of real musical signals is a highly ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This paper deals with the computational analysis of musical audio from recorded audio waveforms. This general problem includes, as sub-tasks, music transcription, extraction of musical pitch, dynamics, timbre, instrument identity, and source separation. Analysis of real musical signals is a highly ill-posed task which is made complicated by the presence of transient sounds, background interference or the complex structure of musical pitches in the time-frequency domain. This paper focuses on models and algorithms for computer transcription of multiple musical pitches in audio, elaborated from previous work by two of the authors. The audio data are supposedly pre-segmented into fixed pitch regimes such as individual chords. The models presented apply to pitched (tonal) music and are formulated via a Gabor representation of non-stationary signals. A Bayesian probabilistic structure is employed for representation of prior information about the parameters of the notes. This paper introduces a numerical Bayesian inference strategy for estimation of the pitches and other parameters of the waveform. The improved algorithm is much quicker, and makes the approach feasible in realistic sitautions.
Analysis of polyphonic audio using source-filter model and
"... non-negative matrix factorization ..."
Low Bitrate Object Coding of Musical Audio Using Bayesian Harmonic Models
, 2006
"... This article deals with the decomposition of music signals into pitched sound objects made of harmonic sinusoidal partials for very low bitrate coding purposes. After a brief review of existing methods, we recast this problem in the Bayesian framework. We propose a family of probabilistic signal mod ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This article deals with the decomposition of music signals into pitched sound objects made of harmonic sinusoidal partials for very low bitrate coding purposes. After a brief review of existing methods, we recast this problem in the Bayesian framework. We propose a family of probabilistic signal models combining learnt object priors and various perceptually motivated distortion measures. We design efficient algorithms to infer object parameters and build a coder based on the interpolation of frequency and amplitude parameters. Listening tests suggest that the loudness-based distortion measure outperforms other distortion measures and that our coder results in a better sound quality than baseline transform and parametric coders at 8 kbit/s and 2 kbit/s. This work constitutes a new step towards a fully object-based coding system, which would represent audio signals as collections of meaningful note-like sound objects.
Joint Detection and Tracking of Time-Varying Harmonic Components: a Flexible Bayesian Approach
- in "IEEE transactions on Speech, Audio and Language Processing
, 2006
"... This paper addresses the joint estimation and detection of time-varying harmonic components in audio signals. We follow a flexible viewpoint, where several frequency/amplitude trajectories are tracked in spectrogram using particle filtering. The core idea is that each harmonic component (composed of ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper addresses the joint estimation and detection of time-varying harmonic components in audio signals. We follow a flexible viewpoint, where several frequency/amplitude trajectories are tracked in spectrogram using particle filtering. The core idea is that each harmonic component (composed of a fundamental partial together with several overtone partials) is considered a target. Tracking requires to define a state-space model with state transition and measurement equations. Particle filtering algorithms rely on a so-called sequential importance distribution, and we show that it can be built on previous multipitch estimation algorithms, so as to yield an even more efficient estimation procedure with established convergence properties. Moreover, as our model captures all the harmonic model information, it actually separates the harmonic sources. Simulations on synthetic and real music data show the interest of our approach.
Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle
- IEEE Trans. Audio, Speech, Lang. Process
, 2010
"... Abstract—A new method for the estimation of multiple concurrent pitches in piano recordings is presented. It addresses the issue of overlapping overtones by modeling the spectral envelope of the overtones of each note with a smooth autoregressive model. For the background noise, a moving-average mod ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract—A new method for the estimation of multiple concurrent pitches in piano recordings is presented. It addresses the issue of overlapping overtones by modeling the spectral envelope of the overtones of each note with a smooth autoregressive model. For the background noise, a moving-average model is used and the combination of both tends to eliminate harmonic and sub-harmonic erroneous pitch estimations. This leads to a complete generative spectral model for simultaneous piano notes, which also explicitly includes the typical deviation from exact harmonicity in a piano overtone series. The pitch set which maximizes an approximate likelihood is selected from among a restricted number of possible pitch combinations as the one. Tests have been conducted on a large homemade database called MAPS, composed of piano recordings from a real upright piano and from high-quality samples. Index Terms—Acoustic signal analysis, audio processing, multipitch estimation, piano, transcription, spectral smoothness. I.
A primitive based generative model to infer timing information in unpartitioned handwriting data
- in Int. Joint Conf. on Artificial Intelligence (IJCAI
, 2007
"... Biological movement control and planning is based upon motor primitives. In our approach, we presume that each motor primitive takes responsibility for controlling a small sub-block of motion, containing coherent muscle activation outputs. A central timing controller cues these subroutines of moveme ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Biological movement control and planning is based upon motor primitives. In our approach, we presume that each motor primitive takes responsibility for controlling a small sub-block of motion, containing coherent muscle activation outputs. A central timing controller cues these subroutines of movement, creating complete movement strategies that are built up by overlaying primitives, thus creating synergies of muscle activation. This partitioning allows the movement to be defined by a sparse code representing the timing of primitive activations. This paper shows that it is possible to use a factorial hidden Markov model to infer primitives in handwriting data. The variation in the handwriting data can to a large extent be explained by timing variation in the triggering of the primitives. Once an appropriate set of primitives has been inferred, the characters can be represented as a set of timings of primitive activations, along with variances, giving a very compact representation of the character. The model is naturally partitioned into a low level primitive output stage, and a top-down primitive timing stage. This partitioning gives us an insight into behaviours such as scribbling, and what is learnt in order to write a new character. 1
A PROBABILISTIC FRAMEWORK FOR MATCHING MUSIC REPRESENTATIONS
"... In this paper we introduce a probabilistic framework for matching different music representations (score, MIDI, audio) by incorporating models of how one musical representation might be rendered from another. We propose a dynamical hidden Markov model for the score pointer as a prior, and two observ ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper we introduce a probabilistic framework for matching different music representations (score, MIDI, audio) by incorporating models of how one musical representation might be rendered from another. We propose a dynamical hidden Markov model for the score pointer as a prior, and two observation models, the first based on matching spectrogram data to a trained template, the second detecting damped sinusoids within a frame of audio by subspace methods. The resulting Bayesian framework is robust to local variations in tempo, and can be used for a wide variety of applications. We evaluate both methods in a score alignment context by inferring the posterior distribution of the current position in the score exactly. The spectrogram method is shown to infer the score position reliably with minimal computation, and the damped sinusoid model is able to pinpoint the positions of score events in the audio with a high level of timing accuracy. 1
PERCEPTUALLY-BASED EVALUATION OF THE ERRORS USUALLY MADE WHEN AUTOMATICALLY TRANSCRIBING MUSIC
- ISMIR 2008 – SESSION 4C – AUTOMATIC MUSIC ANALYSIS AND TRANSCRIPTION
, 2008
"... This paper investigates the perceptual importance of typical errors occurring when transcribing polyphonic music excerpts into a symbolic form. The case of the automatic transcription of piano music is taken as the target application and two subjective tests are designed. The main test aims at under ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper investigates the perceptual importance of typical errors occurring when transcribing polyphonic music excerpts into a symbolic form. The case of the automatic transcription of piano music is taken as the target application and two subjective tests are designed. The main test aims at understanding how human subjects rank typical transcription errors such as note insertion, deletion or replacement, note doubling, incorrect note onset or duration, and so forth. The Bradley-Terry-Luce (BTL) analysis framework is used and the results show that pitch errors are more clearly perceived than incorrect loudness estimations or temporal deviations from the original recording. A second test presents a first attempt to include this information in more perceptually motivated measures for evaluating transcription systems.

