Results 1  10
of
43
Structured Sparsity through Convex Optimization
, 2012
"... Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we consider sit ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
(Show Context)
Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the ℓ1norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of nonlinear variable selection.
Factorized Latent Spaces with Structured Sparsity
"... Recent approaches to multiview learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these appr ..."
Abstract

Cited by 38 (3 self)
 Add to MetaCart
(Show Context)
Recent approaches to multiview learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these approaches involve minimizing nonconvex objective functions. In this paper, we propose an approach to learning such factorized representations inspired by sparse coding techniques. In particular, we show that structured sparsity allows us to address the multiview learning problem by alternately solving two convex optimization problems. Furthermore, the resulting factorized latent spaces generalize over existing approaches in that they allow having latent dimensions shared between any subset of the views instead of between all the views only. We show that our approach outperforms stateoftheart methods on the task of human pose estimation. 1
Nonparametric bayesian sparse factor models with application to gene expression modeling,” The Annals of Applied Statistics
, 2011
"... ar ..."
Multilabel Prediction via Sparse Infinite CCA
"... Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of cor ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a standalone problem, and when applied to multilabel prediction. 1
Sparse probabilistic principal component analysis
 In Proc. AISTATS’2009, JMLR W&CP
, 2009
"... Principal component analysis (PCA) is a popular dimensionality reduction algorithm. However, it is not easy to interpret which of the original features are important based on the principal components. Recent methods improve interpretability by sparsifying PCA through adding an L1 regularizer. In thi ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Principal component analysis (PCA) is a popular dimensionality reduction algorithm. However, it is not easy to interpret which of the original features are important based on the principal components. Recent methods improve interpretability by sparsifying PCA through adding an L1 regularizer. In this paper, we introduce a probabilistic formulation for sparse PCA. By presenting sparse PCA as a probabilistic Bayesian formulation, we gain the benefit of automatic model selection. We examine three different priors for achieving sparsification: (1) a twolevel hierarchical prior equivalent to a Laplacian distribution and consequently to an L1 regularization, (2) an inverseGaussian prior, and (3) a Jeffrey’s prior. We learn these models by applying variational inference. Our experiments verify that indeed our sparse probabilistic model results in a sparse PCA solution. 1
Bayesian and L1 approaches to sparse unsupervised learning. Arxiv preprint arXiv:1106.1157
, 2011
"... The use of L1 regularisation for sparse learning has generated immense research interest, with many successful applications in diverse areas such as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
The use of L1 regularisation for sparse learning has generated immense research interest, with many successful applications in diverse areas such as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically underperforms in terms of predictive performance when compared to other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L1 minimising factor models, Bayesian variants of “L1”, and Bayesian models with a stronger L0like sparsity induced through spikeandslab distributions. These spikeandslab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner, and avoid unnecessary shrinkage of nonzero values. We demonstrate on a number of data sets that in practice spikeandslab Bayesian methods outperform L1 minimisation, even on a computational budget. We thus highlight the need to reassess the wide use of L1 methods in sparsityreliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance.
Bayesian Canonical Correlation Analysis
 JOURNAL OF MACHINE LEARNING RESEARCH 14 (2013) 9651003
, 2013
"... Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. We review recent developments in Bayesian models and inference methods for CCA which are attractive for their potential in hierarchical extensions and for coping with the combination of large dimensionalities and small sample sizes. The existing methods have not been particularly successful in fulfilling the promise yet; we introduce a novel efficient solution that imposes groupwise sparsity to estimate the posterior of an extended model which not only extracts the statistical dependencies (correlations) between data sets but also decomposes the data into shared and data setspecific components. In statistics literature the model is known as interbattery factor analysis (IBFA), for which we now provide a Bayesian treatment.
Bayesian group factor analysis
 In Proc. AISTATS’2012, JMLR W&CP
, 2012
"... We introduce a factor analysis model that summarizes the dependencies between observed variable groups, instead of dependencies between individual variables as standard factor analysis does. A group may correspond to one view of the same set of objects, one of many data sets tied by cooccurrence ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
We introduce a factor analysis model that summarizes the dependencies between observed variable groups, instead of dependencies between individual variables as standard factor analysis does. A group may correspond to one view of the same set of objects, one of many data sets tied by cooccurrence, or a set of alternative variables collected from statistics tables to measure one property of interest. We show that by assuming groupwise sparse factors, active in a subset of the sets, the variation can be decomposed into factors explaining relationships between the sets and factors explaining away setspecific variation. We formulate the assumptions in a Bayesian model providing the factors, and apply the model to two data analysis tasks, in neuroimaging and chemical systems biology. 1
Bayesian learning in sparse graphical factor models via annealed entropy
 Journal of Machine Learning Research
, 2010
"... Editor: We describe a class of sparse latent factor models, called graphical factor models (GFMs), and relevant sparse learning algorithms for posterior mode estimation. Linear, Gaussian GFMs have sparse, orthogonal factor loadings matrices, that, in addition to sparsity of the implied covariance ma ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Editor: We describe a class of sparse latent factor models, called graphical factor models (GFMs), and relevant sparse learning algorithms for posterior mode estimation. Linear, Gaussian GFMs have sparse, orthogonal factor loadings matrices, that, in addition to sparsity of the implied covariance matrices, also induce conditional independence structures via zeros in the implied precision matrices. We describe the models and their use for robust estimation of sparse latent factor structure and data/signal reconstruction. We develop computational algorithms for model exploration and posterior mode search, addressing the hard combinatorial optimization involved in the search over a huge space of potential sparse configurations. A meanfield variational technique coupled with annealing is developed to successively generate “artificial ” posterior distributions that, at the limiting temperature in the annealing schedule, define required posterior modes in the GFM parameter space. Several detailed empirical studies and comparisons to related approaches are discussed, including analyses of handwritten digit image and cancer gene expression data sets.