Results 1 - 10
of
30
HILN - The MPEG-4 Parametric Audio Coding Tools
- in Proc. of IEEE Int. Symposium on Circuits and Systems
, 2000
"... The MPEG-4 Audio Standard combines tools for efficient and flexible coding of audio. For very low bitrate applications, tools based on a parametric signal representation are utilised. The parametric speech coding tools (HVXC) are already available in Version 1 of MPEG-4. The main focus of this paper ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
The MPEG-4 Audio Standard combines tools for efficient and flexible coding of audio. For very low bitrate applications, tools based on a parametric signal representation are utilised. The parametric speech coding tools (HVXC) are already available in Version 1 of MPEG-4. The main focus of this paper is on the parametric audio coding tools "Harmonic and Individual Lines plus Noise" (HILN) which are included in Version 2 of MPEG-4. As already indicated by their name, the HILN tools are based on the decomposition of the audio signal into components which are described by appropriate source models and represented by model parameters. This paper gives an overview of the HILN tools, presents the recent advances in signal modelling and parameter coding, and concludes with an evaluation of the subjective audio quality. 1. INTRODUCTION In the context of evolving multimedia applications -- like digital broadcasting, storage, realtime communication, the World Wide Web, or games -- new demands f...
Sparse Linear Regression With Structured Priors and Application to Denoising of Musical Audio
- IEEE Trans. Sp. Audio Proc
"... Abstract—We describe in this paper an audio denoising tech-nique based on sparse linear regression with structured priors. The noisy signal is decomposed as a linear combination of atoms be-longing to two modified discrete cosine transform (MDCT) bases, plus a residual part containing the noise. One ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
(Show Context)
Abstract—We describe in this paper an audio denoising tech-nique based on sparse linear regression with structured priors. The noisy signal is decomposed as a linear combination of atoms be-longing to two modified discrete cosine transform (MDCT) bases, plus a residual part containing the noise. One MDCT basis has a long time resolution, and thus high frequency resolution, and is aimed at modeling tonal parts of the signal, while the other MDCT basis has short time resolution and is aimed at modeling transient parts (such as attacks of notes). The problem is formulated within a Bayesian setting. Conditional upon an indicator variable which is either 0 or 1, one expansion coefficient is set to zero or given a hi-erarchical prior. Structured priors are employed for the indicator variables; using two types of Markov chains, persistency along the time axis is favored for expansion coefficients of the tonal layer, while persistency along the frequency axis is favored for the ex-pansion coefficients of the transient layer. Inference about the de-noised signal and model parameters is performed using a Gibbs sampler, a standard Markov chain Monte Carlo (MCMC) sam-pling technique. We present results for denoising of a short glock-enspiel excerpt and a long polyphonic music excerpt. Our approach is compared with unstructured sparse regression and with struc-tured sparse regression in a single resolution MDCT basis (no tran-sient layer). The results show that better denoising is obtained, both from signal-to-noise ratio measurements and from subjective cri-teria, when both a transient and tonal layer are used, in conjunc-tion with our proposed structured prior framework. Index Terms—Bayesian variable selection, denoising, Markov chain Monte Carlo (MCMC) methods, nonlinear signal approx-imation, sparse component analysis, sparse regression, sparse representations. I.
Low Bitrate Object Coding of Musical Audio Using Bayesian Harmonic Models
, 2006
"... This article deals with the decomposition of music signals into pitched sound objects made of harmonic sinusoidal partials for very low bitrate coding purposes. After a brief review of existing methods, we recast this problem in the Bayesian framework. We propose a family of probabilistic signal mod ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
(Show Context)
This article deals with the decomposition of music signals into pitched sound objects made of harmonic sinusoidal partials for very low bitrate coding purposes. After a brief review of existing methods, we recast this problem in the Bayesian framework. We propose a family of probabilistic signal models combining learnt object priors and various perceptually motivated distortion measures. We design efficient algorithms to infer object parameters and build a coder based on the interpolation of frequency and amplitude parameters. Listening tests suggest that the loudness-based distortion measure outperforms other distortion measures and that our coder results in a better sound quality than baseline transform and parametric coders at 8 kbit/s and 2 kbit/s. This work constitutes a new step towards a fully object-based coding system, which would represent audio signals as collections of meaningful note-like sound objects.
High-resolution spherical quantization of sinusoidal parameters
- IEEE Trans. Audio, Speech and Lang. Processing
, 2007
"... Sinusoidal modelling is a key technology in low rate audio coding, and methods for efficient quantization of sinusoidal parameters are therefore of high importance. In this work we derive analytical for-mulas for the optimal entropy constrained unrestricted spherical quantizers for amplitude, phase ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
Sinusoidal modelling is a key technology in low rate audio coding, and methods for efficient quantization of sinusoidal parameters are therefore of high importance. In this work we derive analytical for-mulas for the optimal entropy constrained unrestricted spherical quantizers for amplitude, phase and frequency, using a perceptual distortion measure. This is done both for a single sinusoid, and for multiple sinusoids distributed over multiple segments. The quan-tizers minimize a high-resolution approximation of the expected distortion, while the corresponding quantization indices satisfy an entropy constraint. The quantizers turn out to be flexible and of low complexity, in the sense that they can be determined easily for varying bit rate requirements, without any sort of retraining or it-erative procedures. In objective and subjective comparison tests, the proposed method is shown to outperform an existing state-of-the-art sinusoidal quantization scheme, where quantization of fre-quency parameters is done independently. 1.
Audio modeling based on delayed sinusoids
- IEEE Trans. Speech and Audio Processing
, 2004
"... ..."
(Show Context)
A prototype system for object coding of musical audio
- in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA
"... This article deals with low bitrate object coding of musical audio, and more precisely with the extraction of pitched sound objects in polyphonic music. After a brief review of existing methods, we discuss the potential benefits of recasting this problem in a Bayesian framework. We define pitched ob ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
(Show Context)
This article deals with low bitrate object coding of musical audio, and more precisely with the extraction of pitched sound objects in polyphonic music. After a brief review of existing methods, we discuss the potential benefits of recasting this problem in a Bayesian framework. We define pitched objects by a set of probabilistic priors and derive efficient algorithms to infer active objects and their parameters. Preliminary experiments suggest that the proposed method results in a better sound quality than simple sinusoidal coding while achieving a lower bitrate. 1.
A 6kbps to 85kbps scalable audio coder
- In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
, 2000
"... Scalable audio coding is important in network environments, such as the Internet, where bandwidth is not guaranteed, packet loss is common, and client connection data rates are heterogeneous. Signal models provide a general frame work for attacking a wide range of challenges in the unicast delivery ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
Scalable audio coding is important in network environments, such as the Internet, where bandwidth is not guaranteed, packet loss is common, and client connection data rates are heterogeneous. Signal models provide a general frame work for attacking a wide range of challenges in the unicast delivery of real-time audio over packet switched networks. The specific signal model in this work generates a parametric representation for general wide-band audio signals. The model consists of three complementary components: sines, transients, and noise. Because the human hearing system ultimately judges the validity of a model for audio signals, psychoacoustic principles are explicitly considered in the three part model. Once analyzed, the parameters are quantized, compressed and packed into a single 85Kbps bit-stream. From this bit-stream, bit-streams at several bit-rates between 6Kbps and 85Kbps may be readily extracted. The audio coder offers a wide range of scalability while the audio quality of the coding scheme gracefully degrades from perceptually lossless to low-quality. 1.
Prioritizing signals for selective real-time audio processing
- In Proc. of ICAD 2005
, 2005
"... This paper studies various priority metrics that can be used to progressively select sub-parts of a number of audio signals for real-time processing. In particular, five level-related metrics were ex-amined: RMS level, A-weighted level, Zwicker and Moore loud-ness models and a masking threshold-base ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
This paper studies various priority metrics that can be used to progressively select sub-parts of a number of audio signals for real-time processing. In particular, five level-related metrics were ex-amined: RMS level, A-weighted level, Zwicker and Moore loud-ness models and a masking threshold-based model. We conducted a pilot subjective evaluation study aimed at evaluating which met-ric would perform best at reconstructing mixtures of various types (speech, ambient and music) using only a budget amount of origi-nal audio data. Our results suggest that A-weighting performs the worst while results obtained with loudness metrics appear to de-pend on the type of signals. RMS level offers a good compromise for all cases. Our results also show that significant sub-parts of the original audio data can be omitted in most cases, without no-ticeable degradation in the generated mixtures, which validates the usability of our selective processing approach for real-time appli-cations. In this context, we successfully implemented a prototype 3D audio rendering pipeline using our selective approach. 1.
Speeding up HILN – MPEG-4 parametric audio encoding with reduced complexity
- in AES 109th Convention
, 2000
"... Parametric modelling permits an efficient representation of audio signals and is utilised for very low bit rate coding by the MPEG-4 Standard. Here we look at the MPEG-4 parametric audio coding tools ”Harmonic and Individual Lines plus Noise ” (HILN) which are based on a decomposition of the audio s ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
Parametric modelling permits an efficient representation of audio signals and is utilised for very low bit rate coding by the MPEG-4 Standard. Here we look at the MPEG-4 parametric audio coding tools ”Harmonic and Individual Lines plus Noise ” (HILN) which are based on a decomposition of the audio signal into components that are described by appropriate source models and represented by model parameters. Until now, HILN encoding mainly focused on maximum audio quality at the expense of high computational complexity. In this paper, different approaches to speed up HILN encoding are presented and the tradeoff between computational complexity and audio quality is analysed. 1
Parameter Estimation And Tracking For Time-Varying Sinusoids
, 2002
"... Parametric modeling permits an efficient representation of audio signals and is increasingly utilized for very low bit rate coding applications. Such systems are based on a decomposition of the audio signal into components that are described by appropriate source models and represented by model para ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Parametric modeling permits an efficient representation of audio signals and is increasingly utilized for very low bit rate coding applications. Such systems are based on a decomposition of the audio signal into components that are described by appropriate source models and represented by model parameters. Commonly used components types are sinusoidal trajectories, harmonic tones, transients, and noise. Proper estimation and tracking of sinusoids in case of vibrato or portamento is vital, yet difficult due to the uncertainty principle for time-frequency resolution. This paper presents a reliable solution to this problem that was successfully demonstrated in an encoder for MPEG-4 parametric audio coding "Harmonic and Individual Lines plus Noise" (HILN).