Results 1  10
of
35
Fast Coordinate Descent Methods with Variable Selection for Nonnegative Matrix Factorization
, 2011
"... Nonnegative Matrix Factorization (NMF) is an effective dimension reduction method for nonnegative dyadic data, and has proven to be useful in many areas, such as text mining, bioinformatics and image processing. NMF is usually formulated as a constrained nonconvex optimization problem, and many al ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
Nonnegative Matrix Factorization (NMF) is an effective dimension reduction method for nonnegative dyadic data, and has proven to be useful in many areas, such as text mining, bioinformatics and image processing. NMF is usually formulated as a constrained nonconvex optimization problem, and many algorithms have been developed for solving it. Recently, a coordinate descent method, called FastHals [3], has been proposed to solve least squares NMF and is regarded as one of the stateoftheart techniques for the problem. In this paper, we first show that FastHals has an inefficiency in that it uses a cyclic coordinate descent scheme and thus, performs unneeded descent steps on unimportant variables. We then present a variable selection scheme that uses the gradient of the objective function to arrive at a new coordinate descent method. Our new method is considerably faster in practice and we show that it has theoretical convergence guarantees. Moreover when the solution is sparse, as is often the case in real applications, our new method benefits by selecting important variables to update more often, thus resulting in higher speed. As an example, on a text dataset RCV1, our method is 7 times faster than FastHals, and more than 15 times faster when the sparsity is increased by adding an L1 penalty. We also develop new coordinate descent methods when error in NMF is measured by KLdivergence by applying the Newton method to solve the onevariable subproblems. Experiments indicate that our algorithm for minimizing the KLdivergence is faster than the Lee & Seung multiplicative rule by a factor of 10 on the CBCL image dataset.
Random projections for the nonnegative leastsquares problem
 LINEAR ALGEBRA AND ITS APPLICATIONS
, 2009
"... ..."
Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization
, 2011
"... ..."
Efficient Document Clustering via Online Nonnegative Matrix Factorizations
"... In recent years, Nonnegative Matrix Factorization (NMF) has received considerable interest from the data mining and information retrieval fields. NMF has been successfully applied in document clustering, image representation, and other domains. This study proposes an online NMF (ONMF) algorithm to e ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
In recent years, Nonnegative Matrix Factorization (NMF) has received considerable interest from the data mining and information retrieval fields. NMF has been successfully applied in document clustering, image representation, and other domains. This study proposes an online NMF (ONMF) algorithm to efficiently handle very largescale and/or streaming datasets. Unlike conventional NMF solutions which require the entire data matrix to reside in the memory, our ONMF algorithm proceeds with one data point or one chunk of data points at a time. Experiments with onepass and multipass ONMF on real datasets are presented. 1
NonNegative Residual Matrix Factorization with Application to Graph Anomaly Detection ∗
"... Given an IP sourcedestination traffic network, how do we spot misbehavioral IP sources (e.g., portscanner)? How do we find strange users in a usermovie rating graph? Moreover, how can we present the results intuitively so that it is relatively easier for data analysts to interpret? We propose Nr ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Given an IP sourcedestination traffic network, how do we spot misbehavioral IP sources (e.g., portscanner)? How do we find strange users in a usermovie rating graph? Moreover, how can we present the results intuitively so that it is relatively easier for data analysts to interpret? We propose NrMF, a nonnegative residual matrix factorization framework, to address such challenges. We present an optimization formulation as well as an effective algorithm to solve it. Our method can naturally capture abnormal behaviors on graphs. In addition, the proposed algorithm is linear wrt the size of the graph therefore it is suitable for large graphs. The experimental results on several data sets validate its effectiveness as well as efficiency. 1
Convex non–negative matrix factorization in the wild
 in Proceedings of the 9th IEEE International Conference on Data Mining (ICDM–09
"... Abstract—Nonnegative matrix factorization (NMF) has recently received a lot of attention in data mining, information retrieval, and computer vision. It factorizes a nonnegative input matrix V into two nonnegative matrix factors V = WH such that W describes ”clusters ” of the datasets. Analyzing g ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Nonnegative matrix factorization (NMF) has recently received a lot of attention in data mining, information retrieval, and computer vision. It factorizes a nonnegative input matrix V into two nonnegative matrix factors V = WH such that W describes ”clusters ” of the datasets. Analyzing genotypes, social networks, or images, it can be beneficial to ensure V to contain meaningful “cluster centroids”, i.e., to restrict W to be convex combinations of data points. But how can we run this convex NMF in the wild, i.e., given millions of data points? Triggered by the simple observation that each data point is a convex combination of vertices of the data convex hull, we propose to restrict W further to be vertices of the convex hull. The benefits of this convexhull NMF approach are twofold. First, the expected size of the convex hull of, for example, n random Gaussian points in the plane is Ω ( √ log n), i.e., the candidate set typically grows much slower than the data set. Second, distance preserving lowdimensional embeddings allow one to compute candidate vertices efficiently. Our extensive experimental evaluation shows that convexhull NMF compares favorably to convex NMF for large data sets both in terms of speed and reconstruction quality. Moreover, we show that our method can easily be applied to largescale, realworld data sets, in our case consisting of 1.6 million images respectively 150 million votes on World of Warcraft R ○ guilds. Keywordsdata mining; matrix decomposition; data handling; non negative matrix factorization; archetypal analysis; social network analysis; I.
Fast activesettype algorithms for L1regularized linear regression
 In Proc. AISTAT
, 2010
"... In this paper, we investigate new activesettype methods for l1regularized linear regression that overcome some difficulties of existing active set methods. By showing a relationship between l1regularized linear regression and the linear complementarity problem with bounds, we present a fast activ ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we investigate new activesettype methods for l1regularized linear regression that overcome some difficulties of existing active set methods. By showing a relationship between l1regularized linear regression and the linear complementarity problem with bounds, we present a fast activesettype method, called block principal pivoting. This method accelerates computation by allowing exchanges of several variables among working sets. We further provide an improvement of this method, discuss its properties, and also explain a connection to the structure learning of Gaussian graphical models. Experimental comparisons on synthetic and real data sets show that the proposed method is significantly faster than existing active set methods and competitive against recently developed iterative methods. 1
Fast Nonnegative Tensor Factorization with an ActiveSetLike Method
"... Abstract We introduce an efficient algorithm for computing a lowranknonnegativeCANDECOMP/PARAFAC(NNCP)decomposition.Intextmining, signal processing, and computer vision among other areas, imposing nonnegativity constraints to the lowrank factors of matrices and tensors has been shown an effective ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract We introduce an efficient algorithm for computing a lowranknonnegativeCANDECOMP/PARAFAC(NNCP)decomposition.Intextmining, signal processing, and computer vision among other areas, imposing nonnegativity constraints to the lowrank factors of matrices and tensors has been shown an effective technique providing physically meaningful interpretation. A principled methodology for computing NNCP is alternating nonnegative least squares, in which the nonnegativityconstrained least squares (NNLS) problems are solved in each iteration. In this chapter, we propose to solve the NNLS problems using the block principal pivoting method. The block principal pivoting method overcomes some difficulties of the classical active method for the NNLS problems with a large number of variables. We introducetechniquestoacceleratetheblockprincipalpivotingmethodformultiple righthand sides, which is typical in NNCP computation. Computational experiments show the stateoftheart performance of the proposed method. 1
Group Sparsity in Nonnegative Matrix Factorization
, 2012
"... A recent challenge in data analysis for science and engineering is that data are often represented in a structured way. In particular, many data mining tasks have to deal with groupstructured prior information, where features or data items are organized into groups. In this paper, we develop group ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
A recent challenge in data analysis for science and engineering is that data are often represented in a structured way. In particular, many data mining tasks have to deal with groupstructured prior information, where features or data items are organized into groups. In this paper, we develop group sparsity regularization methods for nonnegative matrix factorization (NMF). NMF is an effective data mining tool that has been widely adopted in text mining, bioinformatics, and clustering, but a principled approach to incorporating group information into NMF has been lacking in the literature. Motivated by an observation that features or data items within a group are expected to share the same sparsity pattern in their latent factor representation, we propose mixednorm regularization to promote group sparsity in the factor matrices of NMF. Group sparsity improves the interpretation of latent factors. Efficient convex optimization methods for dealing with the mixednorm term are presented along with computational comparisons between them. Application examples of the proposed method in factor recovery, semisupervised clustering, and multilingual text analysis are demonstrated.