Results 1 - 10
of
45
MPTK: Matching pursuit made tractable
- in Proc. Int. Conf. on Acoustic Speech and Signal Processing
, 2006
"... Matching Pursuit (MP) aims at finding sparse decompositions of signals over redundant bases of elementary waveforms. Traditionally, MP has been considered too slow an algorithm to be applied to real-life problems with high-dimensional signals. Indeed, in terms of floating points operations, its typi ..."
Abstract
-
Cited by 57 (6 self)
- Add to MetaCart
(Show Context)
Matching Pursuit (MP) aims at finding sparse decompositions of signals over redundant bases of elementary waveforms. Traditionally, MP has been considered too slow an algorithm to be applied to real-life problems with high-dimensional signals. Indeed, in terms of floating points operations, its typical numerical implementations have a complexity of ¢¤£¦¥¨§� © and are associated with impractical runtimes. In this paper, we propose a new architecture which exploits the structure shared by many redundant MP dictionaries, and thus decreases its complexity to ¢¤£¦¥�������¥¨ ©. This architecture is implemented in a new software toolkit, called MPTK (the Matching Pursuit Toolkit), which is able to reach, e.g., ������� � real time for a typical MP analysis scenario applied to a 1 hour long audio track. This substantial acceleration makes it possible, from now on, to explore and apply MP in the framework of real-life, high-dimensional data processing problems. 1.
Efficient Coding of Time-Relative Structure Using Spikes
, 2005
"... Nonstationary acoustic features provide essential cues for many auditory tasks, including sound localization, auditory stream analysis, and speech recognition. These features can best be characterized relative to a precise point in time, such as the onset of a sound or the beginning of a harmonic pe ..."
Abstract
-
Cited by 53 (2 self)
- Add to MetaCart
Nonstationary acoustic features provide essential cues for many auditory tasks, including sound localization, auditory stream analysis, and speech recognition. These features can best be characterized relative to a precise point in time, such as the onset of a sound or the beginning of a harmonic periodicity. Extracting these types of features is a difficult problem. Part of the difficulty is that with standard block-based signal analysis methods, the representation is sensitive to the arbitrary alignment of the blocks with respect to the signal. Convolutional techniques such as shift-invariant transformations can reduce this sensitivity, but these do not yield a code that is efficient, that is, one that forms a nonredundant representation of the underlying structure. Here, we develop a non-block-based method for signal representation that is both time relative and efficient. Signals are represented using a linear superposition of time-shiftable kernel functions, each with an associated magnitude and temporal position. Signal decomposition in this method is a non-linear process that consists of optimizing the kernel function scaling coefficients and temporal positions to form an efficient, shift-invariant representation. We demonstrate the properties of this representation for the purpose of characterizing structure in various types of nonstationary acoustic signals. The computational problem investigated here has direct relevance to the neural coding at the auditory nerve and the more general issue of how to encode complex, time-varying signals with a population of spiking neurons.
Wavelet Footprints: Theory, Algorithms, and Applications
- IEEE Trans. Signal Processing
, 2003
"... In recent years, wavelet-based algorithms have been successful in different signal processing tasks. The wavelet transform is a powerful tool because it manages to represent both transient and stationary behaviors of a signal with few transform coefficients. Discontinuities often carry relevant sign ..."
Abstract
-
Cited by 47 (5 self)
- Add to MetaCart
(Show Context)
In recent years, wavelet-based algorithms have been successful in different signal processing tasks. The wavelet transform is a powerful tool because it manages to represent both transient and stationary behaviors of a signal with few transform coefficients. Discontinuities often carry relevant signal information, and therefore, they represent a critical part to analyze. In this paper, we study the dependency across scales of the wavelet coefficients generated by discontinuities. We start by showing that any piecewise smooth signal can be expressed as a sum of a piecewise polynomial signal and a uniformly smooth residual (see Theorem 1 in Section II). We then introduce the notion of footprints, which are scale space vectors that model discontinuities in piecewise polynomial signals exactly. We show that footprints form an overcomplete dictionary and develop efficient and robust algorithms to find the exact representation of a piecewise polynomial function in terms of footprints. This also leads to efficient approximation of piecewise smooth functions. Finally, we focus on applications and show that algorithms based on footprints outperform standard wavelet methods in different applications such as denoising, compression, and (nonblind) deconvolution. In the case of compression, we also prove that at high rates, footprint-based algorithms attain optimal performance (see Theorem 3 in Section V).
Instrument-specific harmonic atoms for mid-level music representation
- IEEE Trans. on Audio, Speech and Lang. Proc
, 2008
"... Abstract—Several studies have pointed out the need for accurate mid-level representations of music signals for information retrieval and signal processing purposes. In this paper, we propose a new mid-level representation based on the decomposition of a signal into a small number of sound atoms or m ..."
Abstract
-
Cited by 39 (6 self)
- Add to MetaCart
(Show Context)
Abstract—Several studies have pointed out the need for accurate mid-level representations of music signals for information retrieval and signal processing purposes. In this paper, we propose a new mid-level representation based on the decomposition of a signal into a small number of sound atoms or molecules bearing explicit musical instrument labels. Each atom is a sum of windowed harmonic sinusoidal partials whose relative amplitudes are specific to one instrument, and each molecule consists of several atoms from the same instrument spanning successive time windows. We design efficient algorithms to extract the most prominent atoms or molecules and investigate several applications of this representation, including polyphonic instrument recognition and music visualization. Index Terms—Mid-level representation, music information retrieval, music visualization, sparse decomposition. I.
Tree-Based Pursuit: Algorithm and Properties
, 2005
"... This paper proposes a tree-based pursuit algorithm that efficiently trades off complexity and approximation performance for overcomplete signal expansions. Finding the sparsest representation of a signal using a redundant dictionary is, in general, a NP-Hard problem. Even sub-optimal algorithms such ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
This paper proposes a tree-based pursuit algorithm that efficiently trades off complexity and approximation performance for overcomplete signal expansions. Finding the sparsest representation of a signal using a redundant dictionary is, in general, a NP-Hard problem. Even sub-optimal algorithms such as Matching Pursuit remain highly complex. We propose a structuring strategy that can be applied to any redundant set of functions, and which basically groups similar atoms together. A measure of similarity based on coherence allows for representing a highly redundant sub-dictionary of atoms by a unique element, called molecule. When the clustering is applied recursively on atoms and then on molecules, it naturally leads to the creation of a tree structure. We then present a new pursuit algorithm that uses the structure created by clustering as a decision tree. This tree-based algorithm offers important complexity reduction with respect to Matching Pursuit, as it prunes important parts of the dictionary when traversing the tree. Recent results on incoherent dictionaries are extended to molecules, while the true highly redundant nature of the dictionary stays hidden by the tree structure. We then derive recovery conditions on the structured dictionary, under which tree-based pursuit is guaranteed to converge. Experimental results finally show that the gain in complexity offered by tree-based pursuit does in general not have a high penalty on the approximation performance. They show that the dimensionality of the problem is reduced thanks to the tree construction, without significant loss of information at hand.
A Geometrical Study of Matching Pursuit parametrization
"... This paper studies the effect of discretizing the parametrization of a dictionary used for Matching Pursuit decompositions of signals. Our approach relies on viewing the continuously parametrized dictionary as an embedded manifold in the signal space on which the tools of differential (Riemannian) g ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This paper studies the effect of discretizing the parametrization of a dictionary used for Matching Pursuit decompositions of signals. Our approach relies on viewing the continuously parametrized dictionary as an embedded manifold in the signal space on which the tools of differential (Riemannian) geometry can be applied. The main contribution of this paper is twofold. First, we prove that if a discrete dictionary reaches a minimal density criterion, then the corresponding discrete MP (dMP) is equivalent in terms of convergence to a weakened hypothetical continuous MP. Interestingly, the corresponding weakness factor depends on a density measure of the discrete dictionary. Second, we show that the insertion of a simple geometric gradient ascent optimization on the atom dMP selection maintains the previous comparison but with a weakness factor at least two times closer to unity than without optimization. Finally, we present numerical experiments confirming our theoretical predictions for decomposition of signals and images on regular discretizations of dictionary parametrizations.
Audio modeling based on delayed sinusoids
- IEEE Trans. Speech and Audio Processing
, 2004
"... ..."
(Show Context)
A Biologically-Inspired Low-Bit-Rate Universal Audio Coder," AES Convention
, 2007
"... The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consi ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
(Show Context)
The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio
Dictionary Design for Matching Pursuit and Application to Motion Compensated Video Coding
, 2004
"... We present a new algorithm for matching pursuit (MP) dictionary design. This technique uses existing vectorquantization (VQ) design techniques and an inner-product based distortion measure to learn functions from a set of training patterns. While this scheme can be applied to many MP applications ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We present a new algorithm for matching pursuit (MP) dictionary design. This technique uses existing vectorquantization (VQ) design techniques and an inner-product based distortion measure to learn functions from a set of training patterns. While this scheme can be applied to many MP applications, we focus on motion compensated video coding. Given a set of training sequences, data is extracted from the high energy packets of the motion compensated frames. Dictionaries with different regions of support are trained, pruned, and finally evaluated on MPEG test sequences. We find that for high bit-rate QCIF sequences we can achieve improvements of up to 0.66 dB with respect to conventional MP with separable Gabor functions.
On Perceptual Distortion Minimization and Nonlinear Least-Squares Frequency Estimation
"... Abstract — In this paper, we present a framework for perceptual error minimization and sinusoidal frequency estimation based on a new perceptual distortion measure and we state its optimal solution. Using this framework, we relate a number of well-known practical methods for perceptual sinusoidal pa ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
Abstract — In this paper, we present a framework for perceptual error minimization and sinusoidal frequency estimation based on a new perceptual distortion measure and we state its optimal solution. Using this framework, we relate a number of well-known practical methods for perceptual sinusoidal parameter estimation such as the pre-filtering method, the weighted matching pursuit and the perceptual matching pursuit. In particular, we derive and compare the sinusoidal estimation criteria used in these methods. We show that for the sinusoidal estimation problem, the pre-filtering method and the weighted matching pursuit are equivalent to the perceptual matching pursuit under certain conditions. I.