#### DMCA

## Proximal Methods for Hierarchical Sparse Coding (2010)

### Cached

### Download Links

Citations: | 81 - 18 self |

### Citations

7184 | Convex Optimization
- Boyd, Vandenberghe
- 2004
(Show Context)
Citation Context ...ion 1 and the algorithm we will develop will therefore be applicable to this problem. 2.2.2 OPTIMIZATION FOR HIERARCHICAL SPARSITY-INDUCING NORMS While generic approaches like interior-point methods (=-=Boyd and Vandenberghe, 2004-=-) and subgradient descent schemes (Bertsekas, 1999) might be used to deal with the nonsmooth norm Ω, several dedicated procedures have been proposed. In (Zhao et al., 2009), a boosting-like technique ... |

4163 | Latent Dirichlet allocation - Blei, Ng, et al. - 2003 |

3993 | Regression shrinkage and selection via the lasso
- TIBSHIRANI
- 1996
(Show Context)
Citation Context ...fields, including signal processing, statistics, and machine learning. This line of research, also known as sparse coding, has witnessed the development of several wellfounded theoretical frameworks (=-=Tibshirani, 1996-=-; Chen et al., 1998; Mallat, 1999; Tropp, 2004, 2006; Wainwright, 2009; Bickel et al., 2009) and the emergence of many efficient algorithmic tools (Efron et al., 2004; Nesterov, 2007; Needell and Trop... |

3136 |
A Wavelet Tour of Signal Processing
- Mallat
- 1998
(Show Context)
Citation Context ...statistics, and machine learning. This line of research, also known as sparse coding, has witnessed the development of several wellfounded theoretical frameworks (Tibshirani, 1996; Chen et al., 1998; =-=Mallat, 1999-=-; Tropp, 2004, 2006; Wainwright, 2009; Bickel et al., 2009) and the emergence of many efficient algorithmic tools (Efron et al., 2004; Nesterov, 2007; Needell and Tropp, 2009; Yuan et al., 2009; Beck ... |

2681 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1998
(Show Context)
Citation Context ...signal processing, statistics, and machine learning. This line of research, also known as sparse coding, has witnessed the development of several wellfounded theoretical frameworks (Tibshirani, 1996; =-=Chen et al., 1998-=-; Mallat, 1999; Tropp, 2004, 2006; Wainwright, 2009; Bickel et al., 2009) and the emergence of many efficient algorithmic tools (Efron et al., 2004; Nesterov, 2007; Needell and Tropp, 2009; Yuan et al... |

1639 |
Learning the parts of objects by non-negative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ... Dαi for every signal xi . While learning simultaneously D and A, one may want to encode specific prior knowledge about the problem at hand, such as, for example, the positivity of the decomposition (=-=Lee and Seung, 1999-=-), or the sparsity of A (Olshausen and Field, 1996, 1997; Aharon et al., 2006; Lee et al., 2007; Mairal et al., 2010a). This leads to penalizing or constraining (D,A) and results in the following form... |

1555 | The Elements of
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ... a dictionary D inR m×p , but the formulation of Eq. (5) extends beyond this context. In particular one can choose f to be the logistic loss, which is commonly used for classification problems (e.g., =-=Hastie et al., 2009-=-). Before turning to optimization methods for the hierarchical sparse coding problem, we consider a particular instance. The sparse group Lasso was recently considered by Sprechmann et al. (2010) and ... |

1510 |
Embedded image coding using zerotrees of wavelet coefficients
- Shapiro
- 1993
(Show Context)
Citation Context ...ture arises in many applications. Wavelet decompositions lend themselves well to this tree organization because of their multiscale structure, and benefit from it for image compression and denoising (=-=Shapiro, 1993-=-; Crouse et al., 1998; Baraniuk, 1999; Baraniuk et al., 2002, 2008; He and Carin, 2009; Zhao et al., 2009; Huang et al., 2009). In the same vein, edge filters of natural image patches can be represent... |

1286 | Least angle regression - Efron, Hastie, et al. - 2004 |

1286 |
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
- Olshausen, Field
- 1996
(Show Context)
Citation Context ...multaneously D and A, one may want to encode specific prior knowledge about the problem at hand, such as, for example, the positivity of the decomposition (Lee and Seung, 1999), or the sparsity of A (=-=Olshausen and Field, 1996-=-, 1997; Aharon et al., 2006; Lee et al., 2007; Mairal et al., 2010a). This leads to penalizing or constraining (D,A) and results in the following formulation: 1 min D∈D,A∈A n n ∑ i=1 [ 1 2 ‖xi− Dα i ‖... |

1151 |
Nonlinear Programming
- Bertsekas
- 1995
(Show Context)
Citation Context ...icable to this problem. 2.2.2 OPTIMIZATION FOR HIERARCHICAL SPARSITY-INDUCING NORMS While generic approaches like interior-point methods (Boyd and Vandenberghe, 2004) and subgradient descent schemes (=-=Bertsekas, 1999-=-) might be used to deal with the nonsmooth norm Ω, several dedicated procedures have been proposed. In (Zhao et al., 2009), a boosting-like technique is used, with a path-following strategy in the spe... |

1126 | Model selection and estimation in regression with grouped variables - Yuan, Lin - 2007 |

1055 |
Finding scientific topics
- Griffiths, Steyvers
- 2004
(Show Context)
Citation Context ...e hierarchy. We plan to compare our approach with this model in future work. Visualization of NIPS proceedings We qualitatively illustrate our approach on the NIPS proceedings from 1988 through 1999 (=-=Griffiths and Steyvers, 2004-=-). After removing words appearing fewer than 10 times, the dataset is composed of 1714 articles, with a vocabulary of 8274 words. As explained above, we considerD + 1 and takeA to beRp×n + . Figure 10... |

983 | Adapting to unknown smoothness via wavelet shrinkage - Donoho, Johnstone - 1995 |

940 | Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research
- Olshausen, Field
- 1997
(Show Context)
Citation Context ...esponding for instance to a basis of wavelets for the denoising of natural images. Second, we show how one can take advantage of this hierarchical sparse coding in the context of dictionary learning (=-=Olshausen and Field, 1997-=-; Aharon et al., 2006; Mairal et al., 2010a), where the dictionary is learned to adapt to the predefined tree structure. This extension of dictionary learning is notably shown to share interesting con... |

933 | A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics
- Martin, Fowlkes, et al.
- 2001
(Show Context)
Citation Context ...able in the context of images, where lots of non-corrupted patches are easily available. 20 We extracted 100000 patches of size m=8×8 pixels from the Berkeley segmentation database of natural images (=-=Martin et al., 2001-=-), which contains a high variability of scenes. We then split this dataset into a training set Xtr, a validation set Xval, and a test set Xte, respectively of size 50000, 25000, and 25000 patches. All... |

919 | k-svd: An algorithm for designing overcomplete dictionaries for sparse representation
- Aharon, Elad, et al.
- 2006
(Show Context)
Citation Context ... basis of wavelets for the denoising of natural images. Second, we show how one can take advantage of this hierarchical sparse coding in the context of dictionary learning (Olshausen and Field, 1997; =-=Aharon et al., 2006-=-; Mairal et al., 2010a), where the dictionary is learned to adapt to the predefined tree structure. This extension of dictionary learning is notably shown to share interesting connections with hierarc... |

901 | Greed is good: Algorithmic results for sparse approximation - Tropp |

747 | Cosamp: Iterative signal recovery from incomplete and inaccurate samples - Needell, Tropp - 2009 |

689 |
Network Flows
- Ahuja, Magnanti, et al.
- 1993
(Show Context)
Citation Context ...nteresting connections with hierarchical probabilistic topic models. To summarize, the contributions of this paper are threefold: 1. A tree is defined as a connected graph that contains no cycle (see =-=Ahuja et al., 1993-=-). 2PROXIMAL METHODS FOR HIERARCHICAL SPARSE CODING • We show that the proximal operator for a tree-structured sparse regularization can be computed exactly in a finite number of operations using a d... |

671 | Compressive sensing
- Baraniuk
- 2007
(Show Context)
Citation Context ...multaneously (see Yuan and Lin, 2006; Stojnic et al., 2009). This example can be viewed as a particular instance of structured sparsity, which has been lately the focus of a large amount of research (=-=Baraniuk et al., 2008-=-; Zhao et al., 2009; Huang et al., 2009; Jacob et al., 2009; Jenatton et al., 2009). In this paper, we concentrate on a specific form of structured sparsity, which we call hierarchical sparse coding: ... |

591 |
Image denoising via sparse and redundant representations over learned dictionaries
- Elad, Aharon
(Show Context)
Citation Context ... Note that we study the ability of the model to reconstruct independent patches, and additional work is required to apply our framework to a full image processing task, where patches usually overlap (=-=Elad and Aharon, 2006-=-; Mairal et al., 2009b). 22PROXIMAL METHODS FOR HIERARCHICAL SPARSE CODING noise 50 % 60 % 70 % 80 % 90 % flat 19.3±0.1 26.8±0.1 36.7±0.1 50.6±0.0 72.1±0.0 tree 18.6±0.1 25.7±0.1 35.0±0.1 48.0±0.0 65... |

478 | Just relax: convex programming methods for identifying sparse signals in noise - Tropp - 2006 |

468 |
Hierarchical clustering schemes
- Johnson
- 1967
(Show Context)
Citation Context ... classes. In addition, both datasets exhibit highly-correlated dictionary elements. Inspired by (Kim and Xing, 2010), we build the tree-structured set of groupsG using Ward’s hierarchical clustering (=-=Johnson, 1967-=-) on the gene expressions. The norm Ω built in this way aims at capturing the hierarchical structure of gene expression networks (Kim and Xing, 2010). Instead of the square loss function, we consider ... |

450 | Simultaneous analysis of Lasso and Dantzig selector. Annals of Statistics - Bickel, Ritov, et al. - 2009 |

434 | Efficient sparse coding algorithms
- Lee, Battle, et al.
- 2007
(Show Context)
Citation Context ...prior knowledge about the problem at hand, such as, for example, the positivity of the decomposition (Lee and Seung, 1999), or the sparsity of A (Olshausen and Field, 1996, 1997; Aharon et al., 2006; =-=Lee et al., 2007-=-; Mairal et al., 2010a). This leads to penalizing or constraining (D,A) and results in the following formulation: 1 min D∈D,A∈A n n ∑ i=1 [ 1 2 ‖xi− Dα i ‖ 2 2+ λΨ(α i ] ) , (10) whereA andD denote tw... |

413 | Wavelet-based statistical signal processing using hidden markov models. to appear
- Crouse, Nowak, et al.
- 1998
(Show Context)
Citation Context ...many applications. Wavelet decompositions lend themselves well to this tree organization because of their multiscale structure, and benefit from it for image compression and denoising (Shapiro, 1993; =-=Crouse et al., 1998-=-; Baraniuk, 1999; Baraniuk et al., 2002, 2008; He and Carin, 2009; Zhao et al., 2009; Huang et al., 2009). In the same vein, edge filters of natural image patches can be represented in an arborescent ... |

388 | Gradient methods for minimizing composite objective function
- Nesterov
(Show Context)
Citation Context ...cal frameworks (Tibshirani, 1996; Chen et al., 1998; Mallat, 1999; Tropp, 2004, 2006; Wainwright, 2009; Bickel et al., 2009) and the emergence of many efficient algorithmic tools (Efron et al., 2004; =-=Nesterov, 2007-=-; Needell and Tropp, 2009; Yuan et al., 2009; Beck and Teboulle, 2009; Wright et al., 2009). In many applied settings, the structure of the problem at hand, such as, e.g., the spatial arrangement of t... |

362 | Sparse Reconstruction by Separable Approximation - Wright, Nowak, et al. - 2009 |

322 | Supervised topic models
- Blei, McAuliffe
- 2007
(Show Context)
Citation Context ...ayesian methods can. Finally, another interesting common line of research to pursue is the supervised design of dictionaries, which has been proved useful in the two frameworks (Mairal et al., 2009a; =-=Blei and McAuliffe, 2008-=-). Acknowledgments This paper was partially supported by grants from the Agence Nationale de la Recherche (MGA Project) and from the European Research Council (SIERRA Project). The authors would like ... |

314 | Online learning for matrix factorization and sparse coding - Mairal, Bach, et al. - 2010 |

304 | Translation-invariant de-noising - Coifman, Donoho - 1995 |

302 | Learning deep architectures for AI. Foundations and Trends - Bengio |

256 | Proximal splitting methods in signal processing
- Combettes, Pesquet
- 2011
(Show Context)
Citation Context ...his type of tree-structured sparsity patterns. We tackle the resulting nonsmooth convex optimization problem with proximal methods (e.g., Nesterov, 2007; Beck and Teboulle, 2009; Wright et al., 2009; =-=Combettes and Pesquet, 2010-=-) whose key step, the computation of the proximal operator, is shown in this paper to be solved exactly with a complexity linear, or close to linear, in the number of dictionary elements—that is, with... |

228 | Convex Analysis and NonLinear Optimization. Theory and Exmaples
- Borwein, Lewis
- 2000
(Show Context)
Citation Context ...ves v itself. On the other hand, a necessary and sufficient optimality condition for having κ=Π ‖.‖∗≤t(v)=argmin ‖v−y‖ 2 ‖y‖∗≤t is that the residual v−κ lies in the normal cone of the constraint set (=-=Borwein and Lewis, 2006-=-), that is, for all y such that ‖y‖ ∗ ≤ t, (v−κ) ⊤ (y−κ)≤ 0. The displayed result then follows from the definition of the dual norm, namely ‖κ‖ ∗ =max ‖z‖≤1 z ⊤ κ. B.3 Proof of Lemma 2 Proof The proof... |

221 | Group Lasso with overlap and graph Lasso
- Jacob, Obozinski, et al.
- 2009
(Show Context)
Citation Context ...his example can be viewed as a particular instance of structured sparsity, which has been lately the focus of a large amount of research (Baraniuk et al., 2008; Zhao et al., 2009; Huang et al., 2009; =-=Jacob et al., 2009-=-; Jenatton et al., 2009). In this paper, we concentrate on a specific form of structured sparsity, which we call hierarchical sparse coding: the dictionary elements are assumed to be embedded in a dir... |

192 | Non-local sparse models for image restoration - Mairal, Bach, et al. |

189 | Supervised dictionary learning - Mairal, Bach, et al. - 2008 |

168 | NESTA: a fast and accurate first-order method for sparse recovery
- Becker, Bobin, et al.
- 2009
(Show Context)
Citation Context ...ical proofs of this section are presented in Appendix B for readability purposes. 3.1 Proximal Operator for the Norm Ω Proximal methods have drawn increasing attention in the signal processing (e.g., =-=Becker et al., 2009-=-; Wright et al., 2009; Combettes and Pesquet, 2010, and numerous references therein) and the machine learning communities (e.g., Bach et al., 2010, and references therein), especially because of their... |

143 | The composite absolute penalties family for grouped and hierarchical variable selection - Zhao, Rocha, et al. |

129 | On the reconstruction of block-sparse signals with an optimal number of measurements - Stojnic, Parvaresh, et al. - 2009 |

128 | Dual averaging methods for regularized stochastic learning and online optimization,” Microsoft Research - Xiao - 2009 |

127 | Efficient online and batch learning using forward backward splitting - Duchi, Singer - 2009 |

126 | Learning with structured sparsity
- Huang, Zhang, et al.
(Show Context)
Citation Context ...nic et al., 2009). This example can be viewed as a particular instance of structured sparsity, which has been lately the focus of a large amount of research (Baraniuk et al., 2008; Zhao et al., 2009; =-=Huang et al., 2009-=-; Jacob et al., 2009; Jenatton et al., 2009). In this paper, we concentrate on a specific form of structured sparsity, which we call hierarchical sparse coding: the dictionary elements are assumed to ... |

124 | Proximal methods for sparse hierarchical dictionary learning
- Jenatton, Mairal, et al.
- 2010
(Show Context)
Citation Context ...ion of topic models as multinomial PCA (Buntine, 2002), and can learn similar hierarchies of topics. This point is discussed in Section 6. Note that this paper extends a shorter version published in (=-=Jenatton et al., 2010-=-). inria-00516723, version 1 - 10 Sep 2010 1.1 Notations Vectors are denoted by bold lower case letters and matrices by upper case ones. We define for q≥1 the ℓq-norm of a vector x in R m as ‖x‖q △ =(... |

123 | The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies - Blei, Griffiths, et al. - 2010 |

123 | Fonctions convexes duales et points proximaux dans un espace Hilbertien - Moreau - 1962 |

118 | Nonlinear learning using local coordinate coding,” NIPS
- Yu, Zhang, et al.
- 2009
(Show Context)
Citation Context ...iments. Although we have not explored locality constraints on the dictionary elements, these have been shown to be particularly relevant to some applications such as patch-based image classification (=-=Yu et al., 2009-=-). Combining tree structure and locality constraints is an interesting future research. 4.2.2 UPDATING THE VECTORS α i The procedure for updating the columns of A is based on the results derived in Se... |

111 | DiscLDA: Discriminative learning for dimensionality reduction and classification - Lacoste-Julien, Sha, et al. - 2009 |

109 | Exploring large feature space with hierarchical multiple kernel learning
- Bach
(Show Context)
Citation Context ... in an arborescent fashion (Zoran and Weiss, 2009). Imposing these sparsity patterns has further proven useful in the context of hierarchical variable selection, e.g., when applied to kernel methods (=-=Bach, 2008-=-), to log-linear models for the selection of potential orders (Schmidt and Murphy, 2010), and to bioinformatics, to exploit the tree structure of gene networks for multi-task regression (Kim and Xing,... |

107 | Sharp thresholds for noisy and high-dimensional recovery of sparsity using ℓ1-constrained quadratic programming (lasso - Wainwright - 2009 |

94 | Variational Extensions to EM and Multinomial PCA
- Buntine
- 2002
(Show Context)
Citation Context ... • Our method establishes a bridge between hierarchical dictionary learning and hierarchical topic models (Blei et al., 2010), which builds upon the interpretation of topic models as multinomial PCA (=-=Buntine, 2002-=-), and can learn similar hierarchies of topics. This point is discussed in Section 6. Note that this paper extends a shorter version published in (Jenatton et al., 2010). inria-00516723, version 1 - 1... |

91 | Medlda: maximum margin supervised topic models for regression and classification - Zhu, Ahmed, et al. - 2009 |

89 | Exploiting structure in Wavelet-based Bayesian compressive sampling - He, Carin - 2009 |

78 | Convex optimization with sparsity-inducing norms
- Bach, Jenatton, et al.
- 2011
(Show Context)
Citation Context ...ncreasing attention in the signal processing (e.g., Becker et al., 2009; Wright et al., 2009; Combettes and Pesquet, 2010, and numerous references therein) and the machine learning communities (e.g., =-=Bach et al., 2011-=-, and references therein), especially because of their convergence rates (optimal for the class of first-order techniques) and their ability to deal with large nonsmooth convex problems (e.g., Nestero... |

76 | and best-ortho-basis: a connection - Donoho - 1997 |

57 | Structured sparsity-inducing norms through submodular functions - Bach |

56 |
An O(n) algorithm for quadratic knapsack problems
- Brucker
- 1984
(Show Context)
Citation Context ...ution is also a group-thresholding operator: ∀g∈G, u |g ↦→ u |g− Π ‖.‖1≤λ[u |g], where Π ‖.‖1≤λ denotes the orthogonal projection onto the ℓ1-ball of radius λ, which can be solved in O(p) operations (=-=Brucker, 1984-=-; Maculan and Galdino de Paula, 1989). Note that when ‖u |g‖1 ≤ λ, we have a group-thresholding effect, with u |g− Π ‖.‖1≤λ[u |g]=0. More generally, a classical result (see, e.g., Combettes and Pesque... |

53 | Network flow algorithms for structured sparsity - Mairal, Jenatton, et al. - 2010 |

52 | A comparison of optimization methods and software for large-scale l1-regularized linear classification - Yuan, Chang, et al. - 2010 |

43 | J.A.: Differentiable sparse coding - Bradley, Bagnell |

30 | Accelerated gradient methods for stochastic optimization and online learning
- Hu, Kwok, et al.
- 2009
(Show Context)
Citation Context ...s (e.g., batch versus stochastic) and/or the assumptions made on f . For instance, the material we develop in this paper could also be applied to online/stochastic frameworks (Duchi and Singer, 2009; =-=Hu et al., 2009-=-; Xiao, 2010) and to possibly nonsmooth functions f (e.g., Duchi and Singer, 2009; Xiao, 2010; Combettes and Pesquet, 2010, and references therein). Finally, most of the technical proofs of this secti... |

28 | Convex structure learning in log-linear models: Beyond pairwise potentials
- Schmidt, Murphy
- 2010
(Show Context)
Citation Context ...ty patterns has further proven useful in the context of hierarchical variable selection, e.g., when applied to kernel methods (Bach, 2008), to log-linear models for the selection of potential orders (=-=Schmidt and Murphy, 2010-=-), and to bioinformatics, to exploit the tree structure of gene networks for multi-task regression (Kim and Xing, 2010). Hierarchies of latent variables, typically used in neural networks and deep lea... |

22 | Optimal tree approximation with wavelets
- Baraniuk
- 1999
(Show Context)
Citation Context ...ierarchical sparse coding: the dictionary elements are assumed to be embedded in a directed treeT , and the sparsity patterns are constrained to form a connected and rooted subtree ofT (Donoho, 1997; =-=Baraniuk, 1999-=-; Baraniuk et al., 2002, 2008; Zhao et al., 2009; Huang et al., 2009). This setting extends more generally to a forest of directed trees. 1 In fact, such a hierarchical structure arises in many applic... |

16 | Collaborative hierarchical sparse modeling
- Sprechmann, Ramirez, et al.
- 2010
(Show Context)
Citation Context ...Other proposed methods consist of a projected gradient descent with approximate projections onto the ball {u ∈ R p ; Ω(u) ≤ λ} (Schmidt and Murphy, 2010), and an augmented-Lagrangian based technique (=-=Sprechmann et al., 2010-=-) for solving a particular case with two-level hierarchies. While the previously listed first-order approaches are (1) loss-function dependent, and/or (2) not guaranteed to achieve optimal convergence... |

10 | An efficient proximal-gradient method for single and multi-task regression with structured sparsity. Annals of Applied Statistics - Chen, Lin, et al. - 2010 |

7 | A comparison of optimization methods for large-scale l1-regularized linear classification
- Yuan, Chang, et al.
- 2010
(Show Context)
Citation Context ... al., 1998; Mallat, 1999; Tropp, 2004, 2006; Wainwright, 2009; Bickel et al., 2009) and the emergence of many efficient algorithmic tools (Efron et al., 2004; Nesterov, 2007; Needell and Tropp, 2009; =-=Yuan et al., 2009-=-; Beck and Teboulle, 2009; Wright et al., 2009). In many applied settings, the structure of the problem at hand, such as, e.g., the spatial arrangement of the pixels in an image, or the presence of va... |

6 | Near best tree approximation - Baraniuk, DeVore, et al. - 2002 |

5 | Galdino de Paula. A linear-time median-finding algorithm for projecting a vector on the simplex of R n - Maculan, G - 1989 |

4 | best-ortho-basis: a connection. Annals of Statistics - CART - 1997 |

1 |
Optimization for sparse methods
- Bach, Jenatton, et al.
- 2010
(Show Context)
Citation Context ...ncreasing attention in the signal processing (e.g., Becker et al., 2009; Wright et al., 2009; Combettes and Pesquet, 2010, and numerous references therein) and the machine learning communities (e.g., =-=Bach et al., 2010-=-, and references therein), especially because of their convergence rates (optimal for the class of first-order techniques) and their ability to deal with 8. Note that the authors of Chen et al. (2010)... |

1 | version 1 - 10 Sep 2010 - inria-00516723 |

1 | A fast iterative shrinkage-thresholding algorithm for linear inverse problems - JENATTON, OBOZINSKI, et al. |

1 | Learning the parts of objects by non-negative matrix factorization - JENATTON, OBOZINSKI, et al. - 1999 |