#### DMCA

## Task-Driven Dictionary Learning

### Cached

### Download Links

Citations: | 86 - 3 self |

### Citations

4166 | Latent dirichlet allocation - Blei, Ng, et al. |

3998 | Regression shrinkage and selection via the LASSO - Tibshirani - 1996 |

3543 | Compressed sensing
- Donoho
- 2006
(Show Context)
Citation Context ...tion allow feature sharing between the classes.MAIRAL ET AL.: TASK-DRIVEN DICTIONARY LEARNING 795 3.3.4 Compressed Sensing Let us consider a signal x in IR m ; the theory of compressed sensing [35], =-=[36]-=- tells us that under certain assumptions, the vector x can be recovered exactly from a few measurements Zx, where Z in IR r m is called a “sensing” matrix with r m. Unlike classical signal processing ... |

3137 |
A wavelet tour of signal processing
- Mallat
- 1998
(Show Context)
Citation Context ...rization, semi-supervised learning, compressed sensing. Ç 1 INTRODUCTION THE linear decomposition of data using a few elements from a learned dictionary instead of a predefined one— based on wavelets =-=[1]-=- for example—has recently led to state-of-the-art results in numerous low-level signal processing tasks such as image denoising [2], [3], [4], audio processing [5], [6], as well as classification task... |

2683 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1998
(Show Context)
Citation Context ...” columns from D that is “close” to the vector x. Experiments have shown that modeling signals with such sparse decompositions (sparse coding) is very effective in many signal processing applications =-=[13]-=-. For natural images, predefined dictionaries based on various types of wavelets [1] have been used for this task. Initially introduced by Olshausen and Field [14] for modeling the spatial receptive f... |

1454 | Gradient-based learning applied to document recognition
- LeCun, Bottou, et al.
- 1998
(Show Context)
Citation Context ...ABLE 1 Test Error in Percent of Our Method for the Digit Recognition Task for Different Dictionary Sizes p 5.2 Handwritten Digits Classification We consider here a classification task using the MNIST =-=[44]-=- and USPS [45] handwritten data sets. MNIST contains 70,000 28 28 images, 60,000 for training, 10,000 for testing, whereas USPS has 7,291 training images and 2,007 test images of size 16 16. We addres... |

1402 | An introduction to compressive sampling
- Candes, Wakin
- 2008
(Show Context)
Citation Context ...n addition allow feature sharing between the classes.MAIRAL ET AL.: TASK-DRIVEN DICTIONARY LEARNING 795 3.3.4 Compressed Sensing Let us consider a signal x in IR m ; the theory of compressed sensing =-=[35]-=-, [36] tells us that under certain assumptions, the vector x can be recovered exactly from a few measurements Zx, where Z in IR r m is called a “sensing” matrix with r m. Unlike classical signal proce... |

1292 | Least angle regression
- Efron, Hastie, et al.
- 2004
(Show Context)
Citation Context ... dictionary), W 2W(initial parameters), T (number of iterations), t0; (learning rate parameters). 1: for t 1 to T do 2: Draw ðyt; xt) from pðy; xÞ. 3: Sparse coding: compute ? using a modified LARS =-=[42]-=-. ? arg min 2IR p 1 2 kxt D k 2 4: Compute the active set: ? fj 2f1; ...;pg : jŠ 6 0g. 5: Compute ? ? : Set C 0 and 2 þ 1k k1 þ 2 2 ? ðD > D þ 2IÞ 1 r ‘sðy t; W; ? Þ. k k2 2 . t0 6: Choose the le... |

1223 | Kernel Methods for Pattern Analysis - Shawe-Taylor, Cristianini - 2004 |

1216 | Algorithms for non-negative matrix factorization
- Lee, Seung
- 2000
(Show Context)
Citation Context ... coding procedure. Minor variants of the formulation (7) can also be considered: Nonnegativity constraints may be added on ? and D, leading to a supervised version of nonnegative matrix factorization =-=[33]-=-, regularized with a sparsityinducing penalty. The function ‘s could also take extra ? arguments such as D and x instead of just y; W; . For simplicity, we have omitted these possibilities, but the fo... |

941 | Sparse coding with an overcomplete basis set: A strategy employed by V1
- Olshausen, Field
- 1997
(Show Context)
Citation Context ... many signal processing applications [13]. For natural images, predefined dictionaries based on various types of wavelets [1] have been used for this task. Initially introduced by Olshausen and Field =-=[14]-=- for modeling the spatial receptive fields of simple cells in the mammalian visual cortex, the idea of learning the dictionary from data instead of using off-the-shelf bases has been shown to signific... |

919 | The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation
- Aharon, Elad, et al.
- 2006
(Show Context)
Citation Context ...arning. Whereas purely data-driven dictionary learning has been shown to be equivalent to a large-scale matrix factorization problem that can be effectively addressed with several methods [14], [17], =-=[18]-=-, [19], its task-driven counterpart has proven to be much more difficult to optimize. Presenting a general efficient framework for various task-driven dictionary learning problems is the main topic of... |

916 | Regularization and variable selection via the elastic net
- Zou, Hastie
(Show Context)
Citation Context ...u of ‘u, this optimization problem is unsupervised. As have others (see, e.g., [18]), we define ‘uðx; DÞ as the optimal value of a sparse coding problem. We choose here the elastic-net formulation in =-=[29]-=-: ‘uðx; DÞ min 2IR p 1 kx D k22 2 þ 1k k1 þ 2 2 k k22 ; ð1Þ where 1 and 2 are regularization parameters. When 2 0, this leads to the ‘1 sparse decomposition problem, also known as basis pursuit [13... |

905 | Robust Face Recognition via Sparse Representation
- Wright, Yang, et al.
- 2009
(Show Context)
Citation Context ...ate-of-the-art denoising algorithms [2], [3], [4]. Unsupervised dictionary learning has also been used for other purposes than pure signal reconstruction, such as classification [5], [7], [11], [12], =-=[15]-=-, but recent works have shown that better results can be obtained when the dictionary is tuned to the specific task (and not just data) it is intended for. Duarte-Carvajalino and Sapiro [16] have, for... |

732 | An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
- Daubechies, Defrise, et al.
- 2004
(Show Context)
Citation Context ..., for instance, the restoration of clean signals y from observed corrupted signals x. Classical signal restoration techniques often focus on removing additive noise or solving inverse linear problems =-=[34]-=-. When the corruption results from an unknown nonlinear transformation, we formulate the restoration task as a general regression problem. This is the case for example in the experiment presented in S... |

592 |
Image denoising via sparse and redundant representations over learned dictionaries
- Elad, Aharon
(Show Context)
Citation Context ... a learned dictionary instead of a predefined one— based on wavelets [1] for example—has recently led to state-of-the-art results in numerous low-level signal processing tasks such as image denoising =-=[2]-=-, [3], [4], audio processing [5], [6], as well as classification tasks [7], [8], [9], [10], [11], [12]. Unlike decompositions based on principal component analysis (PCA) and its variants, these sparse... |

481 | Linear spatial pyramid matching using sparse coding for image classification
- Yang, Yu, et al.
- 2009
(Show Context)
Citation Context ... to state-of-the-art results in numerous low-level signal processing tasks such as image denoising [2], [3], [4], audio processing [5], [6], as well as classification tasks [7], [8], [9], [10], [11], =-=[12]-=-. Unlike decompositions based on principal component analysis (PCA) and its variants, these sparse models do not impose that the dictionary elements be orthogonal, allowing more flexibility to adapt t... |

396 |
Stochastic Approximation and Recursive Algorithms and Applications
- Kushner, Yin
- 2003
(Show Context)
Citation Context ...gorithms are generally well suited to unsupervised dictionary learning when their learning rate is well tuned. The method we propose here is a projected first-order stochastic gradient algorithm (see =-=[40]-=-), and it is given in Algorithm 1. It sequentially draws i.i.d samples ðy t; xtÞ from the probability distribution pðy; xÞ. Obtaining such i.i.d. samples may be difficult since the density pðy; xÞ is ... |

353 | Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
- Lee, Grosse, et al.
- 2009
(Show Context)
Citation Context ...ith a classifier. Learning compact features has also been addressed in the literature of neural networks, with restricted Boltzmann machines (RBMs) and convolutional neural networks, for example (see =-=[22]-=-, [23], [24], [25], [26] and references therein). Interestingly, the question of learning the data representation in an unsupervised or supervised way has also been investigated for these approaches. ... |

322 | Supervised topic models
- Blei, McAuliffe
- 2007
(Show Context)
Citation Context ...restingly, the question of learning the data representation in an unsupervised or supervised way has also been investigated for these approaches. For instance, a supervised topic model is proposed in =-=[27]-=- and tuning latent data representations 0162-8828/12/$31.00 ß 2012 IEEE Published by the IEEE Computer Society792 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 4, APRIL... |

314 | Online learning for matrix factorization and sparse coding - Mairal, Bach, et al. - 2010 |

292 | Single-pixel imaging via compressive sampling
- Duarte, Davenport, et al.
- 2008
(Show Context)
Citation Context ...ere Z in IR r m is called a “sensing” matrix with r m. Unlike classical signal processing methods, such a linear transformation is sometimes included physically in the data acquisition process itself =-=[37]-=-, meaning that a sensor can provide measurements Zx without directly measuring x. In a nutshell, the recovery of x has been proven to be possible when x admits a sparse representation on a dictionary ... |

290 | Self-taught learning: Transfer learning from unlabeled data - Raina, Battle, et al. - 2007 |

281 | Handwritten digit recognition with a back-propagation network
- LeCun
- 1989
(Show Context)
Citation Context ...ror in Percent of Our Method for the Digit Recognition Task for Different Dictionary Sizes p 5.2 Handwritten Digits Classification We consider here a classification task using the MNIST [44] and USPS =-=[45]-=- handwritten data sets. MNIST contains 70,000 28 28 images, 60,000 for training, 10,000 for testing, whereas USPS has 7,291 training images and 2,007 test images of size 16 16. We address this multicl... |

270 | Blind source separation by sparse decomposition in a signal dictionary
- Zibulevsky, Pearlmutter
- 2001
(Show Context)
Citation Context ...edefined one— based on wavelets [1] for example—has recently led to state-of-the-art results in numerous low-level signal processing tasks such as image denoising [2], [3], [4], audio processing [5], =-=[6]-=-, as well as classification tasks [7], [8], [9], [10], [11], [12]. Unlike decompositions based on principal component analysis (PCA) and its variants, these sparse models do not impose that the dictio... |

251 | O.: The tradeoffs of large scale learning
- Bottou, Bousquet
- 2008
(Show Context)
Citation Context ...to have ‘2-norms less than or equal to 1. We will call D the convex set of matrices satisfying this constraint: DfD 2 IR m p s:t: 8j 2f1; ...;pg; kdjk 2 1g: ð2Þ As pointed out by Bottou and Bousquet =-=[31]-=-, one is usually not interested in a perfect minimization of the empirical cost gnðDÞ, but instead in the minimization with respect to D of the expected cost: gðDÞ IEx‘uðx; DÞŠ a:s: lim gnðDÞ; ð3Þ... |

249 | Convex multi-task feature learning - Argyriou, Evgeniou, et al. - 2008 |

228 | Convex analysis and nonlinear optimization. Theory and examples. Second edition - Borwein, Lewis - 2006 |

222 | Learning midlevel features for recognition
- Boureau, Bach, et al.
- 2010
(Show Context)
Citation Context ...vely small signals, and should not be directly applicable to the classification of large images, for which classical computer vision approaches based on bags of words may be better adapted (see [12], =-=[55]-=- for such approaches). However, we show that, for this particular data set, a simple voting scheme based on the classification of small image patches with our method leads to good results. The experim... |

214 | Sparse representation for color image restoration - Mairal, Elad, et al. - 2007 |

213 | Efficient learning of sparse representations with an energy-based model
- Ranzato, Poultney, et al.
- 2006
(Show Context)
Citation Context ...classifier. Learning compact features has also been addressed in the literature of neural networks, with restricted Boltzmann machines (RBMs) and convolutional neural networks, for example (see [22], =-=[23]-=-, [24], [25], [26] and references therein). Interestingly, the question of learning the data representation in an unsupervised or supervised way has also been investigated for these approaches. For in... |

205 | Efficient backprop - Lecun, Bottou, et al. - 1524 |

192 |
Non-local sparse models for image restoration
- Mairal, Bach, et al.
(Show Context)
Citation Context ... dictionary instead of a predefined one— based on wavelets [1] for example—has recently led to state-of-the-art results in numerous low-level signal processing tasks such as image denoising [2], [3], =-=[4]-=-, audio processing [5], [6], as well as classification tasks [7], [8], [9], [10], [11], [12]. Unlike decompositions based on principal component analysis (PCA) and its variants, these sparse models do... |

189 | Supervised dictionary learning - Mairal, Bach, et al. - 2008 |

188 | Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition
- Ranzato, Huang, et al.
(Show Context)
Citation Context ...fier. Learning compact features has also been addressed in the literature of neural networks, with restricted Boltzmann machines (RBMs) and convolutional neural networks, for example (see [22], [23], =-=[24]-=-, [25], [26] and references therein). Interestingly, the question of learning the data representation in an unsupervised or supervised way has also been investigated for these approaches. For instance... |

161 |
Discriminative learned dictionaries for local image analysis
- Mairal, Bach, et al.
- 2008
(Show Context)
Citation Context ...ample—has recently led to state-of-the-art results in numerous low-level signal processing tasks such as image denoising [2], [3], [4], audio processing [5], [6], as well as classification tasks [7], =-=[8]-=-, [9], [10], [11], [12]. Unlike decompositions based on principal component analysis (PCA) and its variants, these sparse models do not impose that the dictionary elements be orthogonal, allowing more... |

150 |
Kernel methods for pattern analysis. Cambridge Univ Pr
- Shawe-Taylor, Cristianini
- 2004
(Show Context)
Citation Context ...oss function that measures how well one can predict y by observing ? ðx; DÞ given the model parameters W. For instance, it can be the square, logistic, or hinge loss from support vector machines (see =-=[32]-=-). The index s of ‘s indicates here that the loss is adapted to a supervised learning problem. The expectation is taken with respect to the unknown probability distribution pðy; xÞ of the data. So far... |

134 | Recovery of exact sparse representations in the presence of bounded noise
- Fuchs
- 2005
(Show Context)
Citation Context ...set, we also have ? ðD > D þ 2IÞ 1 ðD > x 1s Þ; ð21Þ where s in f 1; þ1g j j ? carries the signs of . Proof. Equation (20) can be obtained by considering subgradient optimality conditions as done in =-=[57]-=- for the case 2 0. These can be written as 0 2f D > ðx D ? ? Þþ 2 þ 1p : p 2 @k ? k1g; where @k ? k1 denotes the subdifferential of the ‘1-norm ? evaluated at . A classical result (see [58, page 238... |

117 | Learning invariant features through topographic filter maps
- Kavukcuoglu, Ranzato, et al.
- 2009
(Show Context)
Citation Context ...ly led to state-of-the-art results in numerous low-level signal processing tasks such as image denoising [2], [3], [4], audio processing [5], [6], as well as classification tasks [7], [8], [9], [10], =-=[11]-=-, [12]. Unlike decompositions based on principal component analysis (PCA) and its variants, these sparse models do not impose that the dictionary elements be orthogonal, allowing more flexibility to a... |

113 | A wavelet tour of signal processing, second edition - Mallat - 1999 |

106 | Bayesian inference and optimal design for the sparse linear model
- Seeger
- 2008
(Show Context)
Citation Context ... that jointly learning sensing matrices and dictionaries can do even better in certain cases. A Bayesian framework for learning sensing matrices in compressed sensing applications is also proposed in =-=[39]-=-. Following the latter authors, we study the case where Z is not random but is learned at the same time as the dictionary, and introduce a formulation which falls into out task-driven dictionary learn... |

104 |
An adaptive algorithm for spatial grey scale
- Floyd, Steinberg
- 1976
(Show Context)
Citation Context ... one that looks perceptually similar to the original one (“halftoning”) was posed to the image processing community. Examples of halftoned images obtained with the classical Floyd-Steinberg algorithm =-=[47]-=- are presented in the second column of Fig. 2, with original images in the first column. Restoring these binary images to continuous-tone ones (“inverse halftoning”) has become a classical problem (se... |

94 | Classification using discriminative restricted boltzmann machines
- Larochelle, Bengio
- 2008
(Show Context)
Citation Context ...ng compact features has also been addressed in the literature of neural networks, with restricted Boltzmann machines (RBMs) and convolutional neural networks, for example (see [22], [23], [24], [25], =-=[26]-=- and references therein). Interestingly, the question of learning the data representation in an unsupervised or supervised way has also been investigated for these approaches. For instance, a supervis... |

80 |
Online algorithms and stochastic approximations
- Bottou
- 1998
(Show Context)
Citation Context ...y pðy; xÞ is unknown. At first approximation, the vectors ðy t; xtÞ are obtained in practice by cycling over a randomly permuted training set, which is often done in similar machine learning settings =-=[41]-=-. Algorithm 1. Stochastic gradient descent algorithm for taskdriven dictionary learning. Require: pðy; xÞ (a way to draw i.i.d samples of p), 1; 2; 2 IR (regularization parameters), D 2D (initial dict... |

74 | Efficient backprop,” in Neural Networks: Tricks of the trade - LeCun, Bottou, et al. - 1998 |

68 | Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization
- Duarte-Carvajalino, Sapiro
- 2009
(Show Context)
Citation Context ...1], [12], [15], but recent works have shown that better results can be obtained when the dictionary is tuned to the specific task (and not just data) it is intended for. Duarte-Carvajalino and Sapiro =-=[16]-=- have, for instance, proposed to learn dictionaries for compressed sensing, and in [8], [9], [10], dictionaries are learned for signal classification. In this paper, we will refer to this type of appr... |

57 | Shift-invariant sparse coding for audio classification - Grosse, Raina, et al. - 2007 |

54 | A tutorial on energy-based learning - LeCun, Chopra, et al. |

52 |
Online learning for matrix factorization and sparse
- Mairal, Bach, et al.
(Show Context)
Citation Context .... Whereas purely data-driven dictionary learning has been shown to be equivalent to a large-scale matrix factorization problem that can be effectively addressed with several methods [14], [17], [18], =-=[19]-=-, its task-driven counterpart has proven to be much more difficult to optimize. Presenting a general efficient framework for various task-driven dictionary learning problems is the main topic of this ... |

43 | J.A.: Differentiable sparse coding - Bradley, Bagnell |

40 | Tangent distance kernels for support vector machines
- Haasdonk, Keysers
- 2002
(Show Context)
Citation Context ...te that a cross-validation scheme may give better results, but would be computationally more expensive. Most effective digit recognition techniques use features with shift invariance properties [24], =-=[46]-=-. Since our formulation is less sophisticated than, for instance, the convolutional network architecture in [24] and does not enjoy such properties, we have artificially augmented the size of the trai... |

39 | A digital technique for art authentication
- Lyu, Rockmore, et al.
(Show Context)
Citation Context ... way to new applications of sparse image coding. 5.4 Digital Art Authentification Recognizing authentic paintings from imitations using statistical techniques has been the topic of a few recent works =-=[52]-=-, [53], [54]. Classical methods compare, for example, the kurtosis of wavelet coefficients between a set 13. Denoting by MSE the mean-squared error for images whose intensities are between 0 and 255, ... |

30 |
Frame based signal compression using method of optimal directions (MOD
- Engan, Aase, et al.
- 1999
(Show Context)
Citation Context ...ary learning. Whereas purely data-driven dictionary learning has been shown to be equivalent to a large-scale matrix factorization problem that can be effectively addressed with several methods [14], =-=[17]-=-, [18], [19], its task-driven counterpart has proven to be much more difficult to optimize. Presenting a general efficient framework for various task-driven dictionary learning problems is the main to... |

25 | Learning compressed sensing - Weiss, Chang, et al. - 2007 |

16 |
A fast, high-quality inverse halftoning algorithm for error diffused halftones
- Kite, Venkata, et al.
- 2000
(Show Context)
Citation Context ...fi/~lasip/.MAIRAL ET AL.: TASK-DRIVEN DICTIONARY LEARNING 799 TABLE 2 Inverse Halftoning Experiments Results are reported in PSNR (higher is better). SA-DCT refers to [48], LPA-ICI to [49], FIHT2 to =-=[50]-=-, and WInHD to [51]. The best results for each image are in bold. dictionary, and the regularization parameter 1. These parameters are selected by minimizing the mean-squared error (MSE) reconstructio... |

13 |
Quantification of artistic style through sparse coding analysis
- Hugues, Graham, et al.
(Show Context)
Citation Context ...applications of sparse image coding. 5.4 Digital Art Authentification Recognizing authentic paintings from imitations using statistical techniques has been the topic of a few recent works [52], [53], =-=[54]-=-. Classical methods compare, for example, the kurtosis of wavelet coefficients between a set 13. Denoting by MSE the mean-squared error for images whose intensities are between 0 and 255, the PSNR is ... |

11 | Inverse halftoning based on the anisotropic lpa-ici deconvolution
- Foi, Katkovnik, et al.
- 2004
(Show Context)
Citation Context ...p://www.cs.tut.fi/~lasip/.MAIRAL ET AL.: TASK-DRIVEN DICTIONARY LEARNING 799 TABLE 2 Inverse Halftoning Experiments Results are reported in PSNR (higher is better). SA-DCT refers to [48], LPA-ICI to =-=[49]-=-, FIHT2 to [50], and WInHD to [51]. The best results for each image are in bold. dictionary, and the regularization parameter 1. These parameters are selected by minimizing the mean-squared error (MSE... |

10 | Image processing for artist identification - Johnson, Hendriks, et al. - 2008 |

9 | Statistical study on on-line learning. On-line learning in neural networks - Murata - 1999 |

6 |
WInHD: Wavelet-based inverse halftoning via deconvolution. Submitted to the
- Neelamani, Nowak, et al.
- 2002
(Show Context)
Citation Context ...ET AL.: TASK-DRIVEN DICTIONARY LEARNING 799 TABLE 2 Inverse Halftoning Experiments Results are reported in PSNR (higher is better). SA-DCT refers to [48], LPA-ICI to [49], FIHT2 to [50], and WInHD to =-=[51]-=-. The best results for each image are in bold. dictionary, and the regularization parameter 1. These parameters are selected by minimizing the mean-squared error (MSE) reconstruction on the validation... |

5 |
Inverse halftoning by pointwise shapeadaptive DCT regularized deconvolution
- Dabov, Foi, et al.
- 2006
(Show Context)
Citation Context ...re presented in the second column of Fig. 2, with original images in the first column. Restoring these binary images to continuous-tone ones (“inverse halftoning”) has become a classical problem (see =-=[48]-=- and references therein). Unlike most image processing approaches that explicitly model the halftoning process, we formulate it as a regression problem, without exploiting any prior on the task. We us... |

2 | dictionary learning - “Supervised |