Results 1  10
of
20
Structured variable selection with sparsityinducing norms
, 2011
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to ov ..."
Abstract

Cited by 193 (31 self)
 Add to MetaCart
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for leastsquares linear regression in low and highdimensional settings.
Structured Sparsity through Convex Optimization
"... Abstract. Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we cons ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
(Show Context)
Abstract. Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the ℓ1norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of nonlinear variable selection. Key words and phrases: Sparsity, convex optimization. 1.
Groupsparse model selection: Hardness and relaxations, arXiv preprint arXiv:1303.3207v3
, 2013
"... Groupbased sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing. The main promise of these models is the recovery of “interpretable ” signals along with the identification of their constituent gro ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Groupbased sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing. The main promise of these models is the recovery of “interpretable ” signals along with the identification of their constituent groups. To this end, we establish a combinatorial framework for groupmodel selection problems and highlight the underlying tractability issues revolving around such notions of interpretability when the regression matrix is simply the identity operator. We show that, in general, claims of correctly identifying the groups with convex relaxations would lead to polynomial time solution algorithms for a wellknown NPhard problem, called the weighted maximum cover problem. Instead, leveraging a graphbased understanding of group models, we describe group structures which enable correct model identification in polynomial time via dynamic programming. We also show that group structures that lead to totally unimodular constraints have tractable discrete as well as convex relaxations. Finally, we study the Pareto frontier of budgeted groupsparse approximations for the treebased sparsity model and illustrate identification and computation tradeoffs between our framework and the existing convex relaxations.
Sparse overlapping sets lasso for multitask learning and fmri data analysis
 Neural Information Processing Systems
, 2013
"... Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available feat ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multisubject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso. 1
Learning Multilevel Sparse Representations
"... Bilinear approximation of a matrix is a powerful paradigm of unsupervised learning. In some applications, however, there is a natural hierarchy of concepts that ought to be reflected in the unsupervised analysis. For example, in the neurosciences image sequence considered here, there are the seman ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Bilinear approximation of a matrix is a powerful paradigm of unsupervised learning. In some applications, however, there is a natural hierarchy of concepts that ought to be reflected in the unsupervised analysis. For example, in the neurosciences image sequence considered here, there are the semantic concepts of pixel → neuron → assembly that should find their counterpart in the unsupervised analysis. Driven by this concrete problem, we propose a decomposition of the matrix of observations into a product of more than two sparse matrices, with the rank decreasing from lower to higher levels. In contrast to prior work, we allow for both hierarchical and heterarchical relations of lowerlevel to higherlevel concepts. In addition, we learn the nature of these relations rather than imposing them. Finally, we describe an optimization scheme that allows to optimize the decomposition over all levels jointly, rather than in a greedy levelbylevel fashion. The proposed bilevel SHMF (sparse heterarchical matrix factorization) is the first formalism that allows to simultaneously interpret a calcium imaging sequence in terms of the constituent neurons, their membership in assemblies, and the time courses of both neurons and assemblies. Experiments show that the proposed model fully recovers the structure from difficult synthetic data designed to imitate the experimental data. More importantly, bilevel SHMF yields plausible interpretations of realworld Calcium imaging data. 1
Stable Feature Selection from Brain sMRI
"... Neuroimage analysis usually involves learning thousands or even millions of variables using only a limited number of samples. In this regard, sparse models, e.g. the lasso, are applied to select the optimal features and achieve high diagnosis accuracy. The lasso, however, usually results in indep ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Neuroimage analysis usually involves learning thousands or even millions of variables using only a limited number of samples. In this regard, sparse models, e.g. the lasso, are applied to select the optimal features and achieve high diagnosis accuracy. The lasso, however, usually results in independent unstable features. Stability, a manifest of reproducibility of statistical results subject to reasonable perturbations to data and the model (Yu 2013), is an important focus in statistics, especially in the analysis of high dimensional data. In this paper, we explore a nonnegative generalized fused lasso model for stable feature selection in the diagnosis of Alzheimer’s disease. In addition to sparsity, our model incorporates two important pathological priors: the spatial cohesion of lesion voxels and the positive correlation between the features and the disease labels. To optimize the model, we propose an efficient algorithm by proving a novel link between total variation and fast network flow algorithms via conic duality. Experiments show that the proposed nonnegative model performs much better in exploring the intrinsic structure of data via selecting stable features compared with other stateofthearts.
Modeling Multimodal Continuous Heterogeneity in Conjoint Analysis – A Sparse Learning Approach
, 2014
"... Consumers ’ heterogeneous preferences can often be represented using a multimodal continuous heterogeneity (MCH) distribution. One interpretation of MCH is that the consumer population consists of a few distinct segments, each of which contains a heterogeneous subpopulation. Modeling of MCH raises ..."
Abstract
 Add to MetaCart
(Show Context)
Consumers ’ heterogeneous preferences can often be represented using a multimodal continuous heterogeneity (MCH) distribution. One interpretation of MCH is that the consumer population consists of a few distinct segments, each of which contains a heterogeneous subpopulation. Modeling of MCH raises considerable challenges as both across and withinsegment heterogeneity need to be accounted for. We propose an innovative sparse learning approach for modeling MCH and apply it to conjoint analysis where adequate modeling of consumer heterogeneity is critical. The sparse learning approach models MCH via a twostage divideandconquer framework, in which we first decompose the consumer population by recovering a set of candidate segmentations using structured sparsity modeling, and then use each candidate segmentation to develop a set of individuallevel representations of MCH. We select the optimal individuallevel representation of MCH and the corresponding optimal candidate segmentation using crossvalidation. Two notable features of our approach are that it accommodates both across and withinsegment heterogeneity and endogenously imposes an adequate amount of shrinkage to recover the individuallevel partworths. We empirically validate the performance of the sparse learning approach using extensive simulation experiments and two empirical conjoint data sets.
Saclay ÎledeFrance THEME Computational Medicine and Neuro
"... 3.1. Human neuroimaging data and its use 2 ..."
(Show Context)