Results 1 - 10
of
48
Instrument recognition in polyphonic music based on automatic taxonomies
- IEEE Transactions on Speech and Audio Processing
, 2006
"... We propose a new approach to instrument recognition in the context of real music orchestrations ranging from solos to quartets. The strength of our approach is that it does not require prior musical source separation. Thanks to a hierarchical clustering algorithm exploiting robust probabilistic dist ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
We propose a new approach to instrument recognition in the context of real music orchestrations ranging from solos to quartets. The strength of our approach is that it does not require prior musical source separation. Thanks to a hierarchical clustering algorithm exploiting robust probabilistic distances, we obtain a taxonomy of musical ensembles which is used to efficiently classify possible combinations of instruments played simultaneously. Moreover, a wide set of acoustic features is studied including some new proposals. In particular, Signal to Mask Ratios are found to be useful features for audio classification. This study focuses on a single music genre (i.e. jazz) but combines a variety of instruments among which are percussion and singing voice. Using a varied database of sound excerpts from commercial recordings, we show that the segmentation of music with respect to the instruments played can be achieved with an average accuracy of 53%.
Hybrid representations for audiophonic signal encoding
- Signal Processing
, 2002
"... Abstract. We discuss in this paper a new approach for signal models in the context of audio signal encoding. The method is based upon hybrid models featuring transient, tonal and stochastic compo-nents in the signal. Contrary to several existing approaches, our method does not rely on any prior segm ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Abstract. We discuss in this paper a new approach for signal models in the context of audio signal encoding. The method is based upon hybrid models featuring transient, tonal and stochastic compo-nents in the signal. Contrary to several existing approaches, our method does not rely on any prior segmentation of the signal. The three components are estimated and encoded using a strategy very much in the spirit of transform coding. While the details of the method described here are taylored to audio signals, the general strategy should also apply to other types of signals exhibiting significantly different features, for example images. 1.
Physical Wave Propagation Modeling for Real-Time Synthesis of Natural Sounds
, 2002
"... This thesis proposes banded waveguide synthesis as an approach to real-time sound synthesis based on the underlying physics. So far three main approaches have been widely used: digital waveguide synthesis, modal synthesis and finite element methods. Digital waveguide synthesis is efficient and reali ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This thesis proposes banded waveguide synthesis as an approach to real-time sound synthesis based on the underlying physics. So far three main approaches have been widely used: digital waveguide synthesis, modal synthesis and finite element methods. Digital waveguide synthesis is efficient and realistic and captures the complete dynamics of the underlying physics but is restricted to instruments that are well-described by the one-dimensional string equation. Modal synthesis is efficient and realistic yet abandons complete dynamical description and hence cannot used for certain types of performance interactions like bowing. Finite element methods are realistic and capture the behavior of the constituent physical equations but on current commodity hardware does not perform in real-time. Banded waveguides offer efficient simulations for cases for which modal synthesis is appropriate but traditional digital waveguide synthesis is not applicable. The key realization is that the dynamic behavior of traveling waves, which is being used in waveg-uide synthesis, can be applied to individual modes and that the efficient computational
Towards Autonomous Agents for Live Computer Music: Realtime Machine Listening and Interactive Music Systems
, 2006
"... ..."
Scalable Perceptual Mixing and Filtering of Audio Signals using an Augmented Spectral Representation
- in 8th Int. Conference on Digital Audio Effects
, 2005
"... Many interactive applications, such as video games, require processing a large number of sound signals in real-time. This paper proposes a novel perceptually-based and scalable approach for efficiently filtering and mixing a large number of audio signals. Key to its efficiency is a pre-computed Four ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Many interactive applications, such as video games, require processing a large number of sound signals in real-time. This paper proposes a novel perceptually-based and scalable approach for efficiently filtering and mixing a large number of audio signals. Key to its efficiency is a pre-computed Fourier frequency-domain representation augmented with additional descriptors. The descriptors can be used during the real-time processing to estimate which signals are not going to contribute to the final mixture. Besides, we also propose an importance sampling strategy allowing to tune the processing load relative to the quality of the output. We demonstrate our approach for a variety of applications including equalization and mixing, reverberation processing and spatialization. It can also be used to optimize audio data streaming or decompression. By reducing the number of operations and limiting bus traffic, our approach yields a 3 to 15-fold improvement in overall processing rate compared to brute-force techniques, with minimal degradation of the output. 1.
Gaussian Mixture Models for Extraction of Melodic Lines from Audio Recordings
, 2004
"... The presented study deals with extraction of melodic line(s) from polyphonic audio recordings. We base our work on the use of expectation maximization algorithm, which is employed in a two-step procedure that finds melodic lines in audio signals. In the first step, EM is used to find regions in the ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
The presented study deals with extraction of melodic line(s) from polyphonic audio recordings. We base our work on the use of expectation maximization algorithm, which is employed in a two-step procedure that finds melodic lines in audio signals. In the first step, EM is used to find regions in the signal with strong and stable pitch (melodic fragments). In the second step, these fragments are grouped into clusters according to their properties (pitch, loudness...). The obtained clusters represent distinct melodic lines. Gaussian Mixture Models, trained with EM are used for clustering. The paper presents the entire process in more detail and gives some initial results.
On Finding Melodic Lines in Audio Recordings
, 2004
"... The paper presents our approach to the problem of finding melodic line(s) in polyphonic audio recordings. The approach is composed of two different stages, partially rooted in psychoacoustic theories of music perception: the first stage is dedicated to finding regions with strong and stable pitch (m ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The paper presents our approach to the problem of finding melodic line(s) in polyphonic audio recordings. The approach is composed of two different stages, partially rooted in psychoacoustic theories of music perception: the first stage is dedicated to finding regions with strong and stable pitch (melodic fragments), while in the second stage, these fragments are grouped according to their properties (pitch, loudness...) into clusters which represent melodic lines of the piece. Expectation Maximization algorithm is used in both stages to find the dominant pitch in a region, and to train Gaussian Mixture Models that group fragments into melodies. The paper presents the entire process in more detail and provides some initial results.
A Review of the Theory and Applications of Optimal Subband and Transform Coders
- Journal of Applied and Computational Harmonic Analysis
, 2001
"... this paper we first give a review of the older "classical approaches" to filter bank optimization, to place the ideas in the right perspective. We then review more recent results on optimal filter banks. This includes a review of principal component filter banks, their optimality properties, and som ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
this paper we first give a review of the older "classical approaches" to filter bank optimization, to place the ideas in the right perspective. We then review more recent results on optimal filter banks. This includes a review of principal component filter banks, their optimality properties, and some applications of these. To emphasize the generality of these results we show an application in digital communications (the discrete multitone channel). We show, for example, that the PCFB minimizes transmitted power for a given probability of error and bit rate. We finally discuss future directions and open problems in this broad area
Audio modeling based on delayed sinusoids
- IEEE Trans. Speech Audio Process
, 2004
"... Abstract — In this work, we present an evolution of the DDS (Damped & Delayed Sinusoidal) model introduced within the framework of the general signal modeling. This model is named the Partial Damped & Delayed Sinusoidal (PDDS) model and takes into account a single time delay parameter for a set (sum ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Abstract — In this work, we present an evolution of the DDS (Damped & Delayed Sinusoidal) model introduced within the framework of the general signal modeling. This model is named the Partial Damped & Delayed Sinusoidal (PDDS) model and takes into account a single time delay parameter for a set (sum) of damped sinusoids. This modification is more consistent with the transient audio modeling problem. We show the validity of this approach by comparison with the well-known EDS (Exponentially Damped Sinusoids) approach. Finally, the performances of three model high-resolution parameter estimation algorithms are compared on synthetic fast time-varying signals and on two typical audio transients.
A Non-Uniform Modulation Transform for Audio Coding with Increased Time Resolution
- in Proceedings of IEEE ICASSP, Hong Kong
, 2003
"... Perceptual audio coders exploit two properties to achieve coding gain: perceptual irrelevancy and source redundancy. Recently, a two-dimensional modulation transform was introduced which efficiently extracts perceptual irrelevancy and source redundancy not accessible in a one-dimensional transform. ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Perceptual audio coders exploit two properties to achieve coding gain: perceptual irrelevancy and source redundancy. Recently, a two-dimensional modulation transform was introduced which efficiently extracts perceptual irrelevancy and source redundancy not accessible in a one-dimensional transform. In this paper, we propose an alternative modulation transform design with an octave-band non-uniform modulation dimension. This non-uniform modulation dimension approximately mimics the spacing of modulation filter subbands of the human auditory system, while simultaneously increasing the time resolution of the modulation transform providing improved temporal control of coding noise.

