Results 1  10
of
44
SemiSupervised Clustering via Matrix Factorization
, 2008
"... The recent years have witnessed a surge of interests of semisupervised clustering methods, which aim to cluster the data set under the guidance of some supervisory information. Usually those supervisory information takes the form of pairwise constraints that indicate the similarity/dissimilarity be ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
The recent years have witnessed a surge of interests of semisupervised clustering methods, which aim to cluster the data set under the guidance of some supervisory information. Usually those supervisory information takes the form of pairwise constraints that indicate the similarity/dissimilarity between the two points. In this paper, we propose a novel matrix factorization based approach for semisupervised clustering. In addition, we extend our algorithm to cocluster the data sets of different types with constraints. Finally the experiments on UCI data sets and real world Bulletin Board Systems (BBS) data sets show the superiority of our proposed method.
Unveiling Core NetworkWide Communication Patterns through Application Traffic Activity Graph Decomposition
"... As Internet communications and applications become more complex, operating, managing and securing networks have become increasingly challenging tasks. There are urgent demands for more sophisticated techniques for understanding and analyzing the behavioral characteristics of network traffic. In this ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
As Internet communications and applications become more complex, operating, managing and securing networks have become increasingly challenging tasks. There are urgent demands for more sophisticated techniques for understanding and analyzing the behavioral characteristics of network traffic. In this paper, we study the network traffic behaviors using traffic activity graphs (TAGs), which capture the interactions among hosts engaging in certain types of communications and their collective behavior. TAGs derived from real network traffic are large, sparse, yet seemingly complex and richly connected, therefore difficult to visualize and comprehend. In order to analyze and characterize these TAGs, we propose a novel statistical traffic graph decomposition technique based on orthogonal nonnegative matrix trifactorization (tNMF) to decompose and extract the core host interaction patterns and other structural properties. Using the real network traffic traces, we demonstrate that our tNMFbased graph decomposition technique produces meaningful and interpretable results. It enables us to characterize and quantify the key structural properties of large and sparse TAGs associated with various applications, and study their formation and evolution.
Linear and Nonlinear Projective Nonnegative Matrix Factorization
"... Abstract—A variant of nonnegative matrix factorization (NMF) which was proposed earlier is analyzed here. It is called Projective Nonnegative Matrix Factorization (PNMF). The new method approximately factorizes a projection matrix, minimizing the reconstruction error, into a positive lowrank matrix ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
Abstract—A variant of nonnegative matrix factorization (NMF) which was proposed earlier is analyzed here. It is called Projective Nonnegative Matrix Factorization (PNMF). The new method approximately factorizes a projection matrix, minimizing the reconstruction error, into a positive lowrank matrix and its transpose. The dissimilarity between the original data matrix and its approximation can be measured by the Frobenius matrix norm or the modified KullbackLeibler divergence. Both measures are minimized by multiplicative update rules, whose convergence is proven for the first time. Enforcing orthonormality to the basic objective is shown to lead to an even more efficient update rule, which is also readily extended to nonlinear cases. The formulation of the PNMF objective is shown to be connected to a variety of existing nonnegative matrix factorization methods and clustering approaches. In addition, the derivation using Lagrangian multipliers reveals the relation between reconstruction and sparseness. For kernel principal component analysis with the binary constraint, useful in graph partitioning problems, the nonlinear kernel PNMF provides a good approximation which outperforms an existing discretization approach. Empirical study on three realworld databases shows that PNMF can achieve the best or close to the best in clustering. The proposed algorithm runs more efficiently than the compared nonnegative matrix factorization methods, especially for highdimensional data. Moreover, contrary to the basic NMF, the trained projection matrix can be readily used for newly coming samples and demonstrates good generalization. I.
Nonnegative Matrix Factorization: A Comprehensive Review
 IEEE TRANS. KNOWLEDGE AND DATA ENG
, 2013
"... Nonnegative Matrix Factorization (NMF), a relatively novel paradigm for dimensionality reduction, has been in the ascendant since its inception. It incorporates the nonnegativity constraint and thus obtains the partsbased representation as well as enhancing the interpretability of the issue corres ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Nonnegative Matrix Factorization (NMF), a relatively novel paradigm for dimensionality reduction, has been in the ascendant since its inception. It incorporates the nonnegativity constraint and thus obtains the partsbased representation as well as enhancing the interpretability of the issue correspondingly. This survey paper mainly focuses on the theoretical research into NMF over the last 5 years, where the principles, basic models, properties, and algorithms of NMF along with its various modifications, extensions, and generalizations are summarized systematically. The existing NMF algorithms are divided into four categories: Basic NMF (BNMF),
Knowledge transformation from word space to document space
 In Proc. of SIGIR’ 08
, 2008
"... In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many realworld applications, however, we usually have knowledge on the word side and wish to transform this knowledge to the docu ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many realworld applications, however, we usually have knowledge on the word side and wish to transform this knowledge to the document (concept) side. In this paper, we provide a mechanism for this knowledge transformation. To the best of our knowledge, this is the first model for such type of knowledge transformation. This model uses a nonnegative matrix factorization model X = FSG T, where X is the worddocument semantic matrix, F is the posterior probability of a word belonging to a word cluster and represents knowledge in the word space, G is the posterior probability of a document belonging to a document cluster and represents knowledge in the document space, and S is a scaled matrix factor which provides a condensed view of X. We show how knowledge on words can improve document clustering, i.e, knowledge in the word space is transformed into the document space. We perform extensive experiments to validate our approach.
Binary Matrix Factorization with Applications
"... An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a binary matrix X, we want to factorize X into tw ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a binary matrix X, we want to factorize X into two binary matrices W,H (thus conserving the most important integer property of the objective matrix X) satisfying X ≈ WH. Two algorithms are studied and compared. These methods rely on a fundamental boundedness property of NMF which we propose and prove. This new property also provides a natural normalization scheme that eliminates the bias of factor matrices. Experiments on both synthetic and real world datasets are conducted to show the competency and effectiveness of BMF. 1.
Comparative Document Summarization via Discriminative Sentence Selection
"... Given a collection of document groups, a quick question is what are the differences in these groups. In this paper, we study a novel problem of summarizing the differences between document groups. A discriminative sentence selection method is proposed to extract the most discriminative sentences whi ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Given a collection of document groups, a quick question is what are the differences in these groups. In this paper, we study a novel problem of summarizing the differences between document groups. A discriminative sentence selection method is proposed to extract the most discriminative sentences which represent the specific characteristics of each document group. Experiments on real world data sets demonstrate the effectiveness of our proposed method.
On Trivial Solution and Scale Transfer Problems in Graph Regularized NMF
"... Combining graph regularization with nonnegative matrix (tri)factorization (NMF) has shown great performance improvement compared with traditional nonnegative matrix (tri)factorization models due to its ability to utilize the geometric structure of the documents and words. In this paper, we show th ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Combining graph regularization with nonnegative matrix (tri)factorization (NMF) has shown great performance improvement compared with traditional nonnegative matrix (tri)factorization models due to its ability to utilize the geometric structure of the documents and words. In this paper, we show that these models are not welldefined and suffering from trivial solution and scale transfer problems. In order to solve these common problems, we propose two models for graph regularized nonnegative matrix (tri)factorization, which can be applied for document clustering and coclustering respectively. In the proposed models, a Normalized Cutlike constraint is imposed on the cluster assignment matrix to make the optimization problem welldefined. We derive a multiplicative updating algorithm for the proposed models, and prove its convergence. Experiments of clustering and coclustering on benchmark text data sets demonstrate that the proposed models outperform the original models as well as many other stateoftheart clustering methods. 1
Laplacian regularized gaussian mixture model for data clustering
 IEEE Transactions on Knowledge and Data Engineering
, 2010
"... Abstract—Gaussian Mixture Models (GMMs) are among the most statistically mature methods for clustering. Each cluster is represented by a Gaussian distribution. The clustering process thereby turns to estimate the parameters of the Gaussian mixture, usually by the ExpectationMaximization algorithm. ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Gaussian Mixture Models (GMMs) are among the most statistically mature methods for clustering. Each cluster is represented by a Gaussian distribution. The clustering process thereby turns to estimate the parameters of the Gaussian mixture, usually by the ExpectationMaximization algorithm. In this paper, we consider the case where the probability distribution that generates the data is supported on a submanifold of the ambient space. It is natural to assume that if two points are close in the intrinsic geometry of the probability distribution, then their conditional probability distributions are similar. Specifically, we introduce a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized Gaussian Mixture Model (LapGMM). The data manifold is modeled by a nearest neighbor graph, and the graph structure is incorporated in the maximum likelihood objective function. As a result, the obtained conditional probability distribution varies smoothly along the geodesics of the data manifold. Experimental results on real data sets demonstrate the effectiveness of the proposed approach. Index Terms—Gaussian mixture model, clustering, graph laplacian, manifold structure. Ç 1
Fast Nonnegative Matrix Factorization Algorithms Using Projected Gradient Approaches for LargeScale Problems
, 2008
"... Recently, a considerable growth of interest in projected gradient (PG) methods has been observed due to their high efficiency in solving largescale convex minimization problems subject to linear constraints. Since the minimization problems underlying nonnegative matrix factorization (NMF) of large ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Recently, a considerable growth of interest in projected gradient (PG) methods has been observed due to their high efficiency in solving largescale convex minimization problems subject to linear constraints. Since the minimization problems underlying nonnegative matrix factorization (NMF) of large matrices well matches this class of minimization problems, we investigate and test some recent PG methods in the context of their applicability to NMF. In particular, the paper focuses on the following modified methods: projected Landweber, BarzilaiBorwein gradient projection, projected sequential subspace optimization (PSESOP), interiorpoint Newton (IPN), and sequential coordinatewise. The proposed and implemented NMF PG algorithms are compared with respect to their performance in terms of signaltointerference ratio (SIR) and elapsed time, using a simple benchmark of mixed partially dependent nonnegative signals.