Results 11  20
of
113
Nonnegative Matrix Factorization: A Comprehensive Review
 IEEE TRANS. KNOWLEDGE AND DATA ENG
, 2013
"... Nonnegative Matrix Factorization (NMF), a relatively novel paradigm for dimensionality reduction, has been in the ascendant since its inception. It incorporates the nonnegativity constraint and thus obtains the partsbased representation as well as enhancing the interpretability of the issue corres ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Nonnegative Matrix Factorization (NMF), a relatively novel paradigm for dimensionality reduction, has been in the ascendant since its inception. It incorporates the nonnegativity constraint and thus obtains the partsbased representation as well as enhancing the interpretability of the issue correspondingly. This survey paper mainly focuses on the theoretical research into NMF over the last 5 years, where the principles, basic models, properties, and algorithms of NMF along with its various modifications, extensions, and generalizations are summarized systematically. The existing NMF algorithms are divided into four categories: Basic NMF (BNMF),
Collaborative Filtering Using Orthogonal Nonnegative Matrix Trifactorization
"... Collaborative filtering aims at predicting a test user’s ratings for new items by integrating other likeminded users ’ rating information. Traditional collaborative filtering methods usually suffer from two fundamental problems: sparsity and scalability. In this paper, we propose a novel framework ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Collaborative filtering aims at predicting a test user’s ratings for new items by integrating other likeminded users ’ rating information. Traditional collaborative filtering methods usually suffer from two fundamental problems: sparsity and scalability. In this paper, we propose a novel framework for collaborative filtering by applying Orthogonal Nonnegative Matrix TriFactorization (ONMTF), which (1) alleviates the sparsity problem via matrix factorization; (2)solves the scalability problem by simultaneously clustering rows and columns of the useritem matrix. Experimental results on benchmark data sets are presented to show that our algorithm is indeed more tolerant against both sparsity and scalability, and achieves good performance in the meanwhile. 1
PACBayesian Analysis of Coclustering and Beyond
"... We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approa ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
(Show Context)
We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in cooccurrence matrices. We derive PACBayesian generalization bounds for the expected outofsample performance of coclusteringbased solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of coclustering. The bounds suggest that the expected performance of coclustering is governed by a tradeoff between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this tradeoff for discriminative prediction tasks. This algorithm achieved stateoftheart performance in the MovieLens collaborative filtering task. Our coclustering model can also be seen as matrix trifactorization and the results provide generalization bounds, regularization
Sparse and unique nonnegative matrix factorization through data preprocessing
 Journal of Machine Learning Research
"... Nonnegative matrix factorization (NMF) has become a very popular technique in machine learning because it automatically extracts meaningful features through a sparse and partbased representation. However, NMF has the drawback of being highly illposed, that is, there typically exist many different ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
(Show Context)
Nonnegative matrix factorization (NMF) has become a very popular technique in machine learning because it automatically extracts meaningful features through a sparse and partbased representation. However, NMF has the drawback of being highly illposed, that is, there typically exist many different but equivalent factorizations. In this paper, we introduce a completely new way to obtaining more wellposed NMF problems whose solutions are sparser. Our technique is based on the preprocessing of the nonnegative input data matrix, and relies on the theory of Mmatrices and the geometric interpretation of NMF. This approach provably leads to optimal and sparse solutions under the separability assumption of Donoho and Stodden (2003), and, for rankthree matrices, makes the number of exact factorizations finite. We illustrate the effectiveness of our technique on several image data sets.
CRD: Fast Coclustering on Large Datasets Utilizing SamplingBased Matrix Decomposition
"... The problem of simultaneously clustering columns and rows (coclustering) arises in important applications, such as text data mining, microarray analysis, and recommendation system analysis. Compared with the classical clustering algorithms, coclustering algorithms have been shown to be more effecti ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
The problem of simultaneously clustering columns and rows (coclustering) arises in important applications, such as text data mining, microarray analysis, and recommendation system analysis. Compared with the classical clustering algorithms, coclustering algorithms have been shown to be more effective in discovering hidden clustering structures in the data matrix. The complexity of previous coclustering algorithms is usually O(m × n), where m and n are the numbers of rows and columns in the data matrix respectively. This limits their applicability to data matrices involving a large number of columns and rows. Moreover, some huge datasets can not be entirely held in main memory during coclustering which violates the assumption made by the previous algorithms. In this paper, we propose a general framework for fast coclustering large datasets, CRD. By utilizing recently developed samplingbased matrix decomposition methods, CRD achieves an execution time linear in m and n. Also, CRD does not require the whole data matrix be in the main memory. We conducted extensive experiments on both real and synthetic data. Compared with previous coclustering algorithms, CRD achieves competitive accuracy but with much less computational cost.
Knowledge transformation from word space to document space
 In Proc. of SIGIR’ 08
, 2008
"... In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many realworld applications, however, we usually have knowledge on the word side and wish to transform this knowledge to the docu ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many realworld applications, however, we usually have knowledge on the word side and wish to transform this knowledge to the document (concept) side. In this paper, we provide a mechanism for this knowledge transformation. To the best of our knowledge, this is the first model for such type of knowledge transformation. This model uses a nonnegative matrix factorization model X = FSG T, where X is the worddocument semantic matrix, F is the posterior probability of a word belonging to a word cluster and represents knowledge in the word space, G is the posterior probability of a document belonging to a document cluster and represents knowledge in the document space, and S is a scaled matrix factor which provides a condensed view of X. We show how knowledge on words can improve document clustering, i.e, knowledge in the word space is transformed into the document space. We perform extensive experiments to validate our approach.
Binary Matrix Factorization with Applications
"... An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a binary matrix X, we want to factorize X into tw ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a binary matrix X, we want to factorize X into two binary matrices W,H (thus conserving the most important integer property of the objective matrix X) satisfying X ≈ WH. Two algorithms are studied and compared. These methods rely on a fundamental boundedness property of NMF which we propose and prove. This new property also provides a natural normalization scheme that eliminates the bias of factor matrices. Experiments on both synthetic and real world datasets are conducted to show the competency and effectiveness of BMF. 1.
Dual transfer learning
 In Proceedings of the 12th SIAM International Conference on Data Mining, SDM
, 2012
"... Transfer learning aims to leverage the knowledge in the source domain to facilitate the learning tasks in the target domain. It has attracted extensive research interests recently due to its effectiveness in a wide range of applications. The general idea of the existing methods is to utilize the com ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
Transfer learning aims to leverage the knowledge in the source domain to facilitate the learning tasks in the target domain. It has attracted extensive research interests recently due to its effectiveness in a wide range of applications. The general idea of the existing methods is to utilize the common latent structure shared across domains as the bridge for knowledge transfer. These methods usually model the common latent structure by using either the marginal distribution or the conditional distribution. However, without exploring the duality between these two distributions, these single bridge methods may not achieve optimal capability of knowledge transfer. In this paper, we propose a novel approach, Dual Transfer Learning (DTL), which simultaneously learns the marginal and conditional distributions, and exploits the duality between them in a principled way. The key idea behind DTL is that learning one distribution can help to learn the other. This duality property leads to mutual reinforcement when adapting both distributions across domains to transfer knowledge. The proposed method is formulated as an optimization problem based on joint nonnegative matrix trifactorizations (NMTF). The two distributions are learned from the decomposed latent factors that exhibit the duality property. An efficient alternating minimization algorithm is developed to solve the optimization problem with convergence guarantee. Extensive experimental results demonstrate that DTL is more effective than alternative transfer learning methods.
Sparse latent semantic analysis
 InWorkshop on Neural Information Processing Systems
, 2010
"... Latent semantic analysis (LSA), as one of the most popular unsupervised dimension reduction tools, has a wide range of applications in text mining and information retrieval. The key idea of LSA is to learn a projection matrix that maps the high dimensional vector space representations of documents ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Latent semantic analysis (LSA), as one of the most popular unsupervised dimension reduction tools, has a wide range of applications in text mining and information retrieval. The key idea of LSA is to learn a projection matrix that maps the high dimensional vector space representations of documents to a lower dimensional latent space, i.e. so called latent topic space. In this paper, we propose a new model called Sparse LSA, which produces a sparse projection matrix via the `1 regularization. Compared to the traditional LSA, Sparse LSA selects only a small number of relevant words for each topic and hence provides a compact representation of topicword relationships. Moreover, Sparse LSA is computationally very efficient with much less memory usage for storing the projection matrix. Furthermore, we propose two important extensions of Sparse LSA: group structured Sparse LSA and nonnegative Sparse LSA. We conduct experiments on several benchmark datasets and compare Sparse LSA and its extensions with several widely used methods, e.g. LSA, Sparse Coding and LDA. Empirical results suggest that Sparse LSA achieves similar performance gains to LSA, but is more efficient in projection computation, storage, and also well explain the topicword relationships. 1
Recommendation via query centered random walk on kpartite graph
 In ICDM
, 2007
"... This paper presents a recommendation algorithm that performs a query dependent random walk on a kpartite graph constructed from the various features relevant to the recommendation task. Given the massive size of a kpartite graph, executing the query centered random walk in an online fashion is com ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
This paper presents a recommendation algorithm that performs a query dependent random walk on a kpartite graph constructed from the various features relevant to the recommendation task. Given the massive size of a kpartite graph, executing the query centered random walk in an online fashion is computationally infeasible. To overcome this challenge, we propose to apply multiway clustering on the kpartite graph to reduce the dimensionality of the problem. Random walk is then performed on the subgraph induced by the clusters. Experimental results on three real data sets demonstrate both the effectiveness and efficiency of the proposed algorithm. 1