Results 1 
5 of
5
Generalized Low Rank Models
, 2014
"... Principal components analysis (PCA) is a wellknown technique for approximating a data set represented by a matrix by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompa ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Principal components analysis (PCA) is a wellknown technique for approximating a data set represented by a matrix by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, kmeans, kSVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results. This manuscript is a draft. Comments sent to udell@stanford.edu are welcome.
Scalable Methods for Nonnegative Matrix Factorizations of Nearseparable Tallandskinny Matrices
"... Numerous algorithms are used for nonnegative matrix factorization under the assumption that the matrix is nearly separable. In this paper, we show how to make these algorithms scalable for data matrices that have many more rows than columns, socalled “tallandskinny matrices. ” One key component ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Numerous algorithms are used for nonnegative matrix factorization under the assumption that the matrix is nearly separable. In this paper, we show how to make these algorithms scalable for data matrices that have many more rows than columns, socalled “tallandskinny matrices. ” One key component to these improved methods is an orthogonal matrix transformation that preserves the separability of the NMF problem. Our final methods need to read the data matrix only once and are suitable for streaming, multicore, and MapReduce architectures. We demonstrate the efficacy of these algorithms on terabytesized matrices from scientific computing and bioinformatics. 1 Nonnegative matrix factorizations at scale A nonnegative matrix factorization (NMF) for an m × n matrix X with realvalued, nonnegative entries is X = WH (1) where W is m × r, H is r × n, r < min(m, n), and both factors have nonnegative entries. While
Scalable Methods for Nonnegative Matrix Factorizations of Nearseparable Tallandskinny Matrices
"... • NMF Problem: X ∈ Rm×n+ is a matrix with nonnegative entries, and we want to compute a nonnegative matrix factorization (NMF) X = WH, where W ∈ Rm×r+ and H ∈ Rr×n+. When r < m, this problem is NPhard. • A separable matrix is one that admits a nonnegative factorization where ..."
Abstract
 Add to MetaCart
(Show Context)
• NMF Problem: X ∈ Rm×n+ is a matrix with nonnegative entries, and we want to compute a nonnegative matrix factorization (NMF) X = WH, where W ∈ Rm×r+ and H ∈ Rr×n+. When r < m, this problem is NPhard. • A separable matrix is one that admits a nonnegative factorization where
Contents
, 2014
"... Principal components analysis (PCA) is a wellknown technique for approximating a data set represented by a matrix by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompa ..."
Abstract
 Add to MetaCart
Principal components analysis (PCA) is a wellknown technique for approximating a data set represented by a matrix by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, kmeans, kSVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results. This manuscript is a draft. Comments sent to udell@stanford.edu are welcome. 1 ar