#### DMCA

## Algorithms for Non-negative Matrix Factorization (2001)

### Cached

### Download Links

- [www.cs.cmu.edu]
- [hebb.mit.edu]
- CiteULike
- DBLP

### Other Repositories/Bibliography

Venue: | In NIPS |

Citations: | 1214 - 5 self |

### Citations

11690 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...deed the case as shown in the next section. 6 Proofs of convergence To prove Theorems 1 and 2, we will make use of an auxiliary function similar to that used in the Expectation-Maximization algorithm =-=[15, 16-=-]. Definition 1 G(h; h 0 ) is an auxiliary function for F (h) if the conditions G(h; h 0 ) F (h); G(h; h) = F (h) (10) are satisfied. The auxiliary function is a useful concept because of the followi... |

3770 | Eigenfaces for Recognition
- Turk, Pentland
- 1991
(Show Context)
Citation Context ...epresentational properties. Principal components analysis enforces only a weak orthogonality constraint, resulting in a very distributed representation that uses cancellations to generate variability =-=[1, 2]-=-. On the other hand, vector quantization uses a hard winnertake -all constraint that results in clustering the data into mutually exclusive prototypes [3]. We have previously shown that nonnegativity ... |

3316 |
Principal Component Analysis
- Jolliffe
- 2002
(Show Context)
Citation Context ...epresentational properties. Principal components analysis enforces only a weak orthogonality constraint, resulting in a very distributed representation that uses cancellations to generate variability =-=[1, 2]-=-. On the other hand, vector quantization uses a hard winnertake -all constraint that results in clustering the data into mutually exclusive prototypes [3]. We have previously shown that nonnegativity ... |

2944 |
Numerical recipes in C : the art of scientific computing
- Press, Teukolsky, et al.
- 1992
(Show Context)
Citation Context ...rse, other types of matrix factorizations have been extensively studied in numerical linear algebra, but the nonnegativity constraint makes much of this previous work inapplicable to the present case =-=[8]-=-. Here we discuss two algorithms for NMF based on iterative updates of W and H . Because these algorithms are easy to implement and their convergence properties are guaranteed, we have found them very... |

2099 |
Vector Quantization and Signal Compression
- GERSHO, GRAY
- 1992
(Show Context)
Citation Context ...uses cancellations to generate variability [1, 2]. On the other hand, vector quantization uses a hard winnertake -all constraint that results in clustering the data into mutually exclusive prototypes =-=[3]-=-. We have previously shown that nonnegativity is a useful constraint for matrix factorization that can learn a parts representation of the data [4, 5]. The nonnegative basis vectors that are learned a... |

1639 |
Learning the parts of objects by non-negative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ...ustering the data into mutually exclusive prototypes [3]. We have previously shown that nonnegativity is a useful constraint for matrix factorization that can learn a parts representation of the data =-=[4, 5]-=-. The nonnegative basis vectors that are learned are used in distributed, yet still sparse combinations to generate expressiveness in the reconstructions [6, 7]. In this submission, we analyze in deta... |

480 |
What is the goal of sensory coding
- Field
- 1994
(Show Context)
Citation Context ...earn a parts representation of the data [4, 5]. The nonnegative basis vectors that are learned are used in distributed, yet still sparse combinations to generate expressiveness in the reconstructions =-=[6, 7]-=-. In this submission, we analyze in detail two numerical algorithms for learning the optimal nonnegative factors from data. 2 Non-negative matrix factorization We formally consider algorithms for solv... |

438 |
Maximum likelihood reconstruction for emission tomography.
- Shepp, Vardi
- 1982
(Show Context)
Citation Context ...eralize to different cost functions. Algorithms similar to ours where only one of the factors is adapted have previously been used for the deconvolution of emission tomography and astronomical images =-=[9, 10, 11, 12]-=-. At each iteration of our algorithms, the new value of W or H is found by multiplying the current value by some factor that depends on the quality of the approximation in Eq. (1). We prove that the q... |

373 |
Bayesian-based iterative method of image restoration
- Richardson
- 1972
(Show Context)
Citation Context ...eralize to different cost functions. Algorithms similar to ours where only one of the factors is adapted have previously been used for the deconvolution of emission tomography and astronomical images =-=[9, 10, 11, 12]-=-. At each iteration of our algorithms, the new value of W or H is found by multiplying the current value by some factor that depends on the quality of the approximation in Eq. (1). We prove that the q... |

340 |
An iterative technique for the rectification of observed distributions
- Lucy
- 1974
(Show Context)
Citation Context ...eralize to different cost functions. Algorithms similar to ours where only one of the factors is adapted have previously been used for the deconvolution of emission tomography and astronomical images =-=[9, 10, 11, 12]-=-. At each iteration of our algorithms, the new value of W or H is found by multiplying the current value by some factor that depends on the quality of the approximation in Eq. (1). We prove that the q... |

218 |
Least squares formulation of robust non-negative factor analysis
- Paatero
- 1997
(Show Context)
Citation Context ...on. Such a cost function can be constructed using some measure of distance between two non-negative matrices A and B. One useful measure is simply the square of the Euclidean distance between A and B =-=[1-=-3], jjA Bjj 2 = X ij (A ij B ij ) 2 (2) This is lower bounded by zero, and clearly vanishes if and only if A = B. Another useful measure is D(AjjB) = X ij A ij log A ij B ij A ij +B ij (3) Like the ... |

146 |
Additive versus exponentiated gradient updates for learning linear functions
- Kivinen, Warmuth
- 1994
(Show Context)
Citation Context ...truction is necessarily a fixed point of the update rules. 5 Multiplicative versus additive update rules It is useful to contrast these multiplicative updates with those arising from gradient descent [14]. In particular, a simple additive update for H that reduces the squared distance can be written as H asH a + a (W T V ) a (W T WH) a : (6) If a are all set equal to some small positive ... |

138 | A unified approach to statistical tomography using coordinate descent optimization
- Bouman, Sauer
- 1996
(Show Context)
Citation Context |

93 | Aggregate and mixed-order Markov models for statistical language processing
- Saul, Pereira
- 1997
(Show Context)
Citation Context ...deed the case as shown in the next section. 6 Proofs of convergence To prove Theorems 1 and 2, we will make use of an auxiliary function similar to that used in the Expectation-Maximization algorithm =-=[15, 16-=-]. Definition 1 G(h; h 0 ) is an auxiliary function for F (h) if the conditions G(h; h 0 ) F (h); G(h; h) = F (h) (10) are satisfied. The auxiliary function is a useful concept because of the followi... |

39 | Unsupervised learning by convex and conic coding
- Lee, Seung
- 1997
(Show Context)
Citation Context ...ustering the data into mutually exclusive prototypes [3]. We have previously shown that nonnegativity is a useful constraint for matrix factorization that can learn a parts representation of the data =-=[4, 5]-=-. The nonnegative basis vectors that are learned are used in distributed, yet still sparse combinations to generate expressiveness in the reconstructions [6, 7]. In this submission, we analyze in deta... |

26 |
Sparse coding in the primate cortex. The handbook of brain theory and neural networks
- Foldiak, Young
- 1995
(Show Context)
Citation Context ...earn a parts representation of the data [4, 5]. The nonnegative basis vectors that are learned are used in distributed, yet still sparse combinations to generate expressiveness in the reconstructions =-=[6, 7]-=-. In this submission, we analyze in detail two numerical algorithms for learning the optimal nonnegative factors from data. 2 Non-negative matrix factorization We formally consider algorithms for solv... |