• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis (2009)

by D M WITTEN, R TIBSHIRANI, T HASTIE
Venue:Biostatistics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 169
Next 10 →

Online learning for matrix factorization and sparse coding

by Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro , 2010
"... Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set in order to ad ..."
Abstract - Cited by 330 (31 self) - Add to MetaCart
Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set in order to adapt it to specific data. Variations of this problem include dictionary learning in signal processing, non-negative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to state-of-the-art performance in terms of speed and optimization for both small and large data sets.
(Show Context)

Citation Context

...udes non-negative matrix factorization and its variants (Lee and Seung, 2001; Hoyer, 2002, 2004; Lin, 2007), and sparse principal component analysis (Zou et al., 2006; d’Aspremont et al., 2007, 2008; =-=Witten et al., 2009-=-; Zass and Shashua, 2007). As shown in this paper, these problems have strong similarities; even though we first focus on the problem of dictionary learning, the algorithm we propose is able to addres...

SLEP: Sparse Learning with Efficient Projections

by Jun Liu, Shuiwang Ji, Jieping Ye , 2010
"... ..."
Abstract - Cited by 124 (22 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ations. The use of the ℓ1 norm regularization has led to sparse models for linear regression [69], principle component analysis [83], linear discriminant analysis [76], canonical correlation analysis =-=[65, 75]-=-, partial least squares [28], support vector machines [82], and logistic regression [27, 35, 64, 68]. ℓ1/ℓq-Norm Regularization Recent studies in areas such as machine learning and statistics have als...

Latent Variable Graphical Model Selection via Convex Optimization

by Venkat Chandrasekaran, Pablo A. Parrilo, Alan S. Willsky , 2010
"... Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistic ..."
Abstract - Cited by 76 (4 self) - Add to MetaCart
Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latent-variable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is “spread out ” over most of the observed variables. Next we propose a tractable convex program based on regularized maximum-likelihood for model selection in this latent-variable setting; the regularizer uses both the ℓ1 norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of hidden components and the conditional graphical model structure among the observed variables. These results are applicable in the high-dimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of low-rank matrices play an important role in our analysis.

Structured Sparse Principal Component Analysis

by Rodolphe Jenatton, Guillaume Obozinski, Francis Bach , 2009
"... We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by [1]. While ..."
Abstract - Cited by 70 (14 self) - Add to MetaCart
We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by [1]. While classical sparse priors only deal with cardinality, the regularization we use encodes higher-order information about the data. We propose an efficient and simple optimization procedure to solve this problem. Experiments with two practical tasks, face recognition and the study of the dynamics of a protein complex, demonstrate the benefits of the proposed structured approach over unstructured approaches. 1
(Show Context)

Citation Context

...s. In the last decade, several alternatives to PCA which find sparse and potentially interpretable factors have been proposed, notably nonnegative matrix factorization (NMF) [2] and sparse PCA (SPCA) =-=[3, 4, 5]-=-. However, in many applications, only constraining the size of the factors does not seem appropriate because the considered factors are not only expected to be sparse but also to have a certain struct...

Sparse principal component analysis and iterative thresholding, The Annals of Statistics 41

by Zongming Ma
"... ar ..."
Abstract - Cited by 50 (4 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...tion, researchers have started to develop sparse PCA methodologies, where they seek a set of sparse vectors spanning the low-dimensional subspace that explains most of the variance. See, for example, =-=[15, 37, 3, 29, 32, 36]-=-. These approaches typically start with a certain optimization formulation of PCA and then induce a sparse solution by introducing appropriate penalties or constraints. On the other hand, when Σ indee...

Structured Sparsity through Convex Optimization

by Francis Bach, Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski , 2012
"... Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1-norm. In this paper, we consider sit ..."
Abstract - Cited by 47 (6 self) - Add to MetaCart
Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1-norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the ℓ1-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of nonlinear variable selection.

Large covariance estimation by thresholding principal orthogonal complements

by Jianqing Fan, Yuan Liao , Martina Mincheva , 2013
"... ..."
Abstract - Cited by 41 (13 self) - Add to MetaCart
Abstract not found

Penalized classification using fisher’s linear discriminant

by Daniela M. Witten, Robert Tibshirani - Journal of the Royal Statistical Society, Series B , 2011
"... Summary. We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high dimensional setting where p n, LD ..."
Abstract - Cited by 34 (1 self) - Add to MetaCart
Summary. We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high dimensional setting where p n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule that is obtained from LDA, since it involves all p features.We propose penalized LDA, which is a general approach for penalizing the discriminant vectors in Fisher’s discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization–maximization approach to optimize it efficiently when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L1 and fused lasso penalties. Our proposal is equivalent to recasting Fisher’s discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high dimensional setting and explore their relationships with our proposal.
(Show Context)

Citation Context

...iterion (12) and Algorithm 1 for optimizing it are closely related to the penalized principal components algorithms considered by a number of authors (see e.g. Zou et al., 2006; Shen and Huang, 2008; =-=Witten et al., 2009-=-). This connection stems from the fact that Fisher’s discriminant problem is simply a generalized eigenproblem.18 Witten and Tibshirani The R language software package penalizedLDA implementing penal...

Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis, PLoS genetics 2010;6:e1001117

by B E Engelhardt , M Stephens
"... Abstract We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix fact ..."
Abstract - Cited by 34 (4 self) - Add to MetaCart
Abstract We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more ''continuous,'' as in isolation-by-distance models.
(Show Context)

Citation Context

... we use a novel approach described below. Other possible methods for matrix factorization that may be appropriate for this problem include nonnegative matrix factorization [23], and sparse PCA (e.g., =-=[24]-=-). We summarize results from these methods in our Discussion. Sparse factor analysis. We now briefly describe our novel approach to SFA; see Methods for further details. The SFA model assumes Gi,j*N (...

Truncated Power Method for Sparse Eigenvalue Problems

by Xiao-tong Yuan, Tong Zhang, Hui Zou
"... This paper considers the sparse eigenvalue problem, which is to extract dominant (largest) sparse eigenvectors with at most k non-zero components. We propose a simple yet effective solution called truncated power method that can approximately solve the underlying nonconvex optimization problem. A st ..."
Abstract - Cited by 32 (1 self) - Add to MetaCart
This paper considers the sparse eigenvalue problem, which is to extract dominant (largest) sparse eigenvectors with at most k non-zero components. We propose a simple yet effective solution called truncated power method that can approximately solve the underlying nonconvex optimization problem. A strong sparse recovery result is proved for the truncated power method, and this theory is our key motivation for developing the new algorithm. The proposed method is tested on applications such as sparse principal component analysis and the densest k-subgraph problem. Extensive experiments on several synthetic and real-world data sets demonstrate the competitive empirical performance of our method.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University