Results 1 
4 of
4
Fast and guaranteed tensor decomposition via sketching. In
 NIPS,
, 2015
"... Abstract Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized computation of tensor contractions via FFTs, without explicitly forming the tensors. Such tensor contractions are encountered in decomposition methods such as tensor power iterations and alternating least squares. We also design novel colliding hashes for symmetric tensors to further save time in computing the sketches. We then combine these sketching ideas with existing whitening and tensor power iterative techniques to obtain the fastest algorithm on both sparse and dense tensors. The quality of approximation under our method does not depend on properties such as sparsity, uniformity of elements, etc. We apply the method for topic modeling and obtain competitive results.
SPLATT: Efficient and Parallel Sparse TensorMatrix Multiplication
, 2015
"... Multidimensional arrays, or tensors, are increasingly found in fields such as signal processing and recommender systems. Realworld tensors can be enormous in size and often very sparse. There is a need for efficient, highperformance tools capable of processing the massive sparse tensors of today ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Multidimensional arrays, or tensors, are increasingly found in fields such as signal processing and recommender systems. Realworld tensors can be enormous in size and often very sparse. There is a need for efficient, highperformance tools capable of processing the massive sparse tensors of today and the future. This paper introduces SPLATT, a C library with sharedmemory parallelism for threemode tensors. SPLATT contains algorithmic improvements over competing state of the art tools for sparse tensor factorization. SPLATT has a fast, parallel method of multiplying a matricized tensor by a KhatriRao product, which is a key kernel in tensor factorization methods. SPLATT uses a novel data structure that exploits the sparsity patterns of tensors. This data structure has a small memory footprint similar to competing methods and allows for the computational improvements featured in our work. We also present a method of finding cachefriendly reorderings and utilizing them with a novel form of cache tiling. To our knowledge, this is the first work to investigate reordering and cache tiling in this context. SPLATT averages almost 30× speedup compared to our baseline when using 16 threads and reaches over 80 × speedup on NELL2.
Sublinear Time Orthogonal Tensor Decomposition *
"... Abstract A recent work (Wang et. al., NIPS 2015) gives the fastest known algorithms for orthogonal tensor decomposition with provable guarantees. Their algorithm is based on computing sketches of the input tensor, which requires reading the entire input. We show in a number of cases one can achiev ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract A recent work (Wang et. al., NIPS 2015) gives the fastest known algorithms for orthogonal tensor decomposition with provable guarantees. Their algorithm is based on computing sketches of the input tensor, which requires reading the entire input. We show in a number of cases one can achieve the same theoretical guarantees in sublinear time, i.e., even without reading most of the input tensor. Instead of using sketches to estimate inner products in tensor decomposition algorithms, we use importance sampling. To achieve sublinear time, we need to know the norms of tensor slices, and we show how to do this in a number of important cases. For symmetric tensors T = k i=1 λ i u ⊗p i with λ i > 0 for all i, we estimate such norms in sublinear time whenever p is even. For the important case of p = 3 and small values of k, we can also estimate such norms. For asymmetric tensors sublinear time is not possible in general, but we show if the tensor slice norms are just slightly below T F then sublinear time is again possible. One of the main strengths of our work is empirical in a number of cases our algorithm is orders of magnitude faster than existing methods with the same accuracy.
TensorMatrix Products with a Compressed Sparse Tensor
, 2015
"... The Canonical Polyadic Decomposition (CPD) of tensors is a powerful tool for analyzing multiway data and is used extensively to analyze very large and extremely sparse datasets. The bottleneck of computing the CPD is multiplying a sparse tensor by several dense matrices. Algorithms for tensormatr ..."
Abstract
 Add to MetaCart
(Show Context)
The Canonical Polyadic Decomposition (CPD) of tensors is a powerful tool for analyzing multiway data and is used extensively to analyze very large and extremely sparse datasets. The bottleneck of computing the CPD is multiplying a sparse tensor by several dense matrices. Algorithms for tensormatrix products fall into two classes. The first class saves floating point operations by storing a compressed tensor for each dimension of the data. These methods are fast but suffer high memory costs. The second class uses a single uncompressed tensor at the cost of additional floating point operations. In this work, we bridge the gap between the two approaches and introduce the compressed sparse fiber (CSF) a data structure for sparse tensors along with a novel parallel algorithm for tensormatrix multiplication. CSF offers similar operation reductions as existing compressed methods while using only a single tensor structure. We validate our contributions with experiments comparing against stateoftheart methods on a diverse set of datasets. Our work uses 58 % less memory than the stateoftheart while achieving 81 % of the parallel performance on 16 threads.