#### DMCA

## Non-negative matrix factorization with sparseness constraints,” (2004)

### Cached

### Download Links

Venue: | Journal of Machine Learning Research, |

Citations: | 498 - 0 self |

### Citations

1686 | Learning the parts of objects by non-negative matrix factorization. Nature - Lee, Seung - 1999 |

1305 |
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
- Olshausen, Field
- 1996
(Show Context)
Citation Context ...ed by modern image processing techniques. Here, we tested the result of using additional sparseness constraints. Figure 5 shows the basis vectors obtained by putting a sparseness constraint on the coefficients (Sh = 0.85) but leaving the sparseness of the basis vectors unconstrained. In this case, NMF learns oriented features that represent edges and lines. Such oriented features are widely regarded as the best type of low-level features for representing natural images, and similar features are also used by the early visual system of the biological brain (Field, 1987; Simoncelli et al., 1992; Olshausen and Field, 1996; Bell and Sejnowski, 1997). This example illustrates that sparseness constrained NMF does not simply ‘sparsify’ the result of standard, unconstrained NMF, but rather can find qualitatively different parts-based representations that are more compatible with the sparseness assumptions. 4.3 Convergence of Algorithm Implementing the Projection Step To verify the performance of our projection method we performed extensive tests, varying the number of dimensions, the desired degree of sparseness, and the sparseness of the original vector. The desired and the initial degrees of sparseness were set t... |

1243 | Algorithms for non-negative matrix factorization
- Seung, Lee
(Show Context)
Citation Context ...es, it can be difficult to interpret the results of PCA and ICA (Paatero and Tapper, 1994; Parra et al., 2000). Second, non-negativity has been argued for based on the intuition that parts are generally combined additively (and not subtracted) to form a whole; hence, these constraints might be useful for learning parts-based representations (Lee and Seung, 1999). Given a data matrix V, the optimal choice of matrices W and H are defined to be those nonnegative matrices that minimize the reconstruction error between V and WH. Various error functions have been proposed (Paatero and Tapper, 1994; Lee and Seung, 2001), perhaps the most widely 1458 NMF WITH SPARSENESS CONSTRAINTS a c b Figure 1: NMF applied to various image data sets. (a) Basis images given by NMF applied to face image data from the CBCL database (http://cbcl.mit.edu/cbcl/software-datasets/FaceData2.html), following Lee and Seung (1999). In this case NMF produces a parts-based representation of the data. (b) Basis images derived from the ORL face image database (http://www.uk.research.att.com/facedatabase.html), following Li et al. (2001). Here, the NMF representation is global rather than parts-based. (c) Basis vectors from NMF applied to ... |

829 | Relations Between the Statistics of Natural Images and the Response Properties of Cortical Cells
- Field
- 1987
(Show Context)
Citation Context ...ot oriented features like those employed by modern image processing techniques. Here, we tested the result of using additional sparseness constraints. Figure 5 shows the basis vectors obtained by putting a sparseness constraint on the coefficients (Sh = 0.85) but leaving the sparseness of the basis vectors unconstrained. In this case, NMF learns oriented features that represent edges and lines. Such oriented features are widely regarded as the best type of low-level features for representing natural images, and similar features are also used by the early visual system of the biological brain (Field, 1987; Simoncelli et al., 1992; Olshausen and Field, 1996; Bell and Sejnowski, 1997). This example illustrates that sparseness constrained NMF does not simply ‘sparsify’ the result of standard, unconstrained NMF, but rather can find qualitatively different parts-based representations that are more compatible with the sparseness assumptions. 4.3 Convergence of Algorithm Implementing the Projection Step To verify the performance of our projection method we performed extensive tests, varying the number of dimensions, the desired degree of sparseness, and the sparseness of the original vector. The desi... |

617 | The ”independent components” of natural scenes are edge filters. Vision Research
- Bell, Sejnowski
- 1997
(Show Context)
Citation Context ...ng techniques. Here, we tested the result of using additional sparseness constraints. Figure 5 shows the basis vectors obtained by putting a sparseness constraint on the coefficients (Sh = 0.85) but leaving the sparseness of the basis vectors unconstrained. In this case, NMF learns oriented features that represent edges and lines. Such oriented features are widely regarded as the best type of low-level features for representing natural images, and similar features are also used by the early visual system of the biological brain (Field, 1987; Simoncelli et al., 1992; Olshausen and Field, 1996; Bell and Sejnowski, 1997). This example illustrates that sparseness constrained NMF does not simply ‘sparsify’ the result of standard, unconstrained NMF, but rather can find qualitatively different parts-based representations that are more compatible with the sparseness assumptions. 4.3 Convergence of Algorithm Implementing the Projection Step To verify the performance of our projection method we performed extensive tests, varying the number of dimensions, the desired degree of sparseness, and the sparseness of the original vector. The desired and the initial degrees of sparseness were set to 0.1, 0.3, 0.5, 0.7, and 0... |

560 | Shiftable multi-scale transforms
- Simoncelli, Freeman, et al.
- 1992
(Show Context)
Citation Context ...eatures like those employed by modern image processing techniques. Here, we tested the result of using additional sparseness constraints. Figure 5 shows the basis vectors obtained by putting a sparseness constraint on the coefficients (Sh = 0.85) but leaving the sparseness of the basis vectors unconstrained. In this case, NMF learns oriented features that represent edges and lines. Such oriented features are widely regarded as the best type of low-level features for representing natural images, and similar features are also used by the early visual system of the biological brain (Field, 1987; Simoncelli et al., 1992; Olshausen and Field, 1996; Bell and Sejnowski, 1997). This example illustrates that sparseness constrained NMF does not simply ‘sparsify’ the result of standard, unconstrained NMF, but rather can find qualitatively different parts-based representations that are more compatible with the sparseness assumptions. 4.3 Convergence of Algorithm Implementing the Projection Step To verify the performance of our projection method we performed extensive tests, varying the number of dimensions, the desired degree of sparseness, and the sparseness of the original vector. The desired and the initial degre... |

527 |
Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.
- Paatero, Tapper
- 1994
(Show Context)
Citation Context ...omplete MATLAB code both for standard NMF and for our extension. Our hope is that this will further the application of these methods to solving novel data-analysis problems. Keywords: non-negative matrix factorization, sparseness, data-adaptive representations 1. Introduction A fundamental problem in many data-analysis tasks is to find a suitable representation of the data. A useful representation typically makes latent structure in the data explicit, and often reduces the dimensionality of the data so that further computational methods can be applied. Non-negative matrix factorization (NMF) (Paatero and Tapper, 1994; Lee and Seung, 1999) is a recent method for finding such a representation. Given a non-negative data matrix V, NMF finds an approximate factorization V ≈ WH into non-negative factors W and H. The non-negativity constraints make the representation purely additive (allowing no subtractions), in contrast to many other linear representations such as principal component analysis (PCA) and independent component analysis (ICA) (Hyvarinen et al., 2001). One of the most useful properties of NMF is that it usually produces a sparse representation of the data. Such a representation encodes much of the... |

490 |
What is the goal of sensory coding
- Field
- 1994
(Show Context)
Citation Context ...e data matrix V, NMF finds an approximate factorization V ≈ WH into non-negative factors W and H. The non-negativity constraints make the representation purely additive (allowing no subtractions), in contrast to many other linear representations such as principal component analysis (PCA) and independent component analysis (ICA) (Hyvarinen et al., 2001). One of the most useful properties of NMF is that it usually produces a sparse representation of the data. Such a representation encodes much of the data using few ‘active’ components, which makes the encoding easy to interpret. Sparse coding (Field, 1994) has also, on theoretical grounds, been shown to be a useful middle ground between completely distributed representations, on the one hand, and unary representations (grandmother cells) on the other (Foldiak and Young, 1995; Thorpe, 1995). However, because the sparseness given by NMF is somewhat of a side-effect rather than a goal, one cannot in any way control the degree to which the representation is sparse. In many applications, more direct control over the properties of the representation is needed. In this paper, we extend NMF to include the option to control sparseness explicitly. We s... |

200 | When does non-negative matrix factorization give a correct decomposition into parts
- Donoho, Stodden
- 2003
(Show Context)
Citation Context ...ote zero.) Note that NMF represents this natural image data using circularly symmetric features. used is the squared error (euclidean distance) function E(W,H) = ‖V−WH‖2 = ∑ i, j (Vi j − (WH)i j)2. Although the minimization problem is convex in W and H separately, it is not convex in both simultaneously. Paatero and Tapper (1994) gave a gradient algorithm for this optimization, whereas Lee and Seung (2001) devised a multiplicative algorithm that is somewhat simpler to implement and also showed good performance. Although some theoretical work on the properties of the NMF representation exists (Donoho and Stodden, 2004), much of the appeal of NMF comes from its empirical success in learning meaningful features from a diverse collection of real-life data sets. Lee and Seung (1999) showed that, when the data set consisted of a collection of face images, the representation consisted of 1459 HOYER basis vectors encoding for the mouth, nose, eyes, etc; the intuitive features of face images. In Figure 1a we have reproduced that basic result using the same data set. Additionally, they showed that meaningful topics can be learned when text documents are used as data. Subsequently, NMF has been successfully applied t... |

200 | Learning spatially localized parts-based representations.
- Li, Hou, et al.
- 2001
(Show Context)
Citation Context ...owever, because the sparseness given by NMF is somewhat of a side-effect rather than a goal, one cannot in any way control the degree to which the representation is sparse. In many applications, more direct control over the properties of the representation is needed. In this paper, we extend NMF to include the option to control sparseness explicitly. We show that this allows us to discover parts-based representations that are qualitatively better than those given c©2004 Patrik O. Hoyer. HOYER by basic NMF. We also discuss the relationship between our method and other recent extensions of NMF (Li et al., 2001; Hoyer, 2002; Liu et al., 2003). Additionally, this contribution includes a complete MATLAB package for performing NMF and its various extensions. Although the most basic version of NMF requires only two lines of code and certainly does not warrant distributing a separate software package, its several extensions involve more complicated operations; the absense of ready-made code has probably hindered their widespread use so far. We hope that our software package will alleviate the problem. This paper is structured as follows. In Section 2 we describe non-negative matrix factorization, and dis... |

177 |
Metagenes and molecular pattern discovery using matrix factorization
- Brunet, Tamayo, et al.
- 2004
(Show Context)
Citation Context ...success in learning meaningful features from a diverse collection of real-life data sets. Lee and Seung (1999) showed that, when the data set consisted of a collection of face images, the representation consisted of 1459 HOYER basis vectors encoding for the mouth, nose, eyes, etc; the intuitive features of face images. In Figure 1a we have reproduced that basic result using the same data set. Additionally, they showed that meaningful topics can be learned when text documents are used as data. Subsequently, NMF has been successfully applied to a variety of data sets (Buchsbaum and Bloch, 2002; Brunet et al., 2004; Jung and Kim, 2004; Kim and Tidor, 2003). Despite this success, there also exist data sets for which NMF does not give an intuitive decomposition into parts that would correspond to our idea of the ‘building blocks’ of the data. Li et al. (2001) showed that when NMF was applied to a different facial image database, the representation was global rather than local, qualitatively different from that reported by Lee and Seung (1999). Again, we have rerun that experiment and confirm those results, see Figure 1b. The difference was mainly attributed to how well the images were hand-aligned (Li et ... |

166 | Non-negative sparse coding.
- Hoyer
- 2002
(Show Context)
Citation Context ...he sparseness given by NMF is somewhat of a side-effect rather than a goal, one cannot in any way control the degree to which the representation is sparse. In many applications, more direct control over the properties of the representation is needed. In this paper, we extend NMF to include the option to control sparseness explicitly. We show that this allows us to discover parts-based representations that are qualitatively better than those given c©2004 Patrik O. Hoyer. HOYER by basic NMF. We also discuss the relationship between our method and other recent extensions of NMF (Li et al., 2001; Hoyer, 2002; Liu et al., 2003). Additionally, this contribution includes a complete MATLAB package for performing NMF and its various extensions. Although the most basic version of NMF requires only two lines of code and certainly does not warrant distributing a separate software package, its several extensions involve more complicated operations; the absense of ready-made code has probably hindered their widespread use so far. We hope that our software package will alleviate the problem. This paper is structured as follows. In Section 2 we describe non-negative matrix factorization, and discuss its succ... |

96 | Algorithms for non-negative independent component analysis.
- Plumbley
- 2003
(Show Context)
Citation Context ...zation similar to ours, but with two important differences. First, the signs of the components are in general not restricted; in fact, symmetry is often assumed, implying an approximately equal number of positive and negative elements. Second, the sources are not forced to any desired degree of sparseness (as in our method) but rather sparseness is incorporated into the objective function to be optimized. The sparseness goal can be put on either W or H, or both (Stone et al., 2002). Recently, some authors have considered estimating the ICA model in the case of one-sided, non-negative sources (Plumbley, 2003; Oja and Plumbley, 2004). In these methods, non-negativity is not specified as a constraint but rather as an objective; hence, complete non-negativity of the representation is seldom achieved for real-life data sets. Nevertheless, one can show that if the linear ICA model holds, with non-negative components, these methods can identify the model. 6. Conclusions Non-negative matrix factorization (NMF) has proven itself a useful tool in the analysis of a diverse range of data. One of its most useful properties is that the resulting decompositions are often intuitive and easy to interpret because... |

61 | Sparse coding in the primate cortex. - Foldiak, Young - 1995 |

47 | Unmixing hyperspectral data. - Parra, Spence, et al. - 2000 |

45 | Subsystem identification through dimensionality reduction of large-scale gene expression data.
- Kim, Tidor
- 2003
(Show Context)
Citation Context ...rom a diverse collection of real-life data sets. Lee and Seung (1999) showed that, when the data set consisted of a collection of face images, the representation consisted of 1459 HOYER basis vectors encoding for the mouth, nose, eyes, etc; the intuitive features of face images. In Figure 1a we have reproduced that basic result using the same data set. Additionally, they showed that meaningful topics can be learned when text documents are used as data. Subsequently, NMF has been successfully applied to a variety of data sets (Buchsbaum and Bloch, 2002; Brunet et al., 2004; Jung and Kim, 2004; Kim and Tidor, 2003). Despite this success, there also exist data sets for which NMF does not give an intuitive decomposition into parts that would correspond to our idea of the ‘building blocks’ of the data. Li et al. (2001) showed that when NMF was applied to a different facial image database, the representation was global rather than local, qualitatively different from that reported by Lee and Seung (1999). Again, we have rerun that experiment and confirm those results, see Figure 1b. The difference was mainly attributed to how well the images were hand-aligned (Li et al., 2001). Another case where the decompo... |

43 | Spatiotemporal independent component analysis of event-related fMRI data using skewed probability density functions.
- Stone, Porrill, et al.
- 2002
(Show Context)
Citation Context ...he statistical technique called independent component analysis (ICA) (Hyvarinen et al., 2001). ICA attempts to find a matrix factorization similar to ours, but with two important differences. First, the signs of the components are in general not restricted; in fact, symmetry is often assumed, implying an approximately equal number of positive and negative elements. Second, the sources are not forced to any desired degree of sparseness (as in our method) but rather sparseness is incorporated into the objective function to be optimized. The sparseness goal can be put on either W or H, or both (Stone et al., 2002). Recently, some authors have considered estimating the ICA model in the case of one-sided, non-negative sources (Plumbley, 2003; Oja and Plumbley, 2004). In these methods, non-negativity is not specified as a constraint but rather as an objective; hence, complete non-negativity of the representation is seldom achieved for real-life data sets. Nevertheless, one can show that if the linear ICA model holds, with non-negative components, these methods can identify the model. 6. Conclusions Non-negative matrix factorization (NMF) has proven itself a useful tool in the analysis of a diverse range o... |

27 |
Color categories revealed by non-negative matrix factorization of munsell color spectra.
- Buchsbaum, Bloch
- 2002
(Show Context)
Citation Context ...F comes from its empirical success in learning meaningful features from a diverse collection of real-life data sets. Lee and Seung (1999) showed that, when the data set consisted of a collection of face images, the representation consisted of 1459 HOYER basis vectors encoding for the mouth, nose, eyes, etc; the intuitive features of face images. In Figure 1a we have reproduced that basic result using the same data set. Additionally, they showed that meaningful topics can be learned when text documents are used as data. Subsequently, NMF has been successfully applied to a variety of data sets (Buchsbaum and Bloch, 2002; Brunet et al., 2004; Jung and Kim, 2004; Kim and Tidor, 2003). Despite this success, there also exist data sets for which NMF does not give an intuitive decomposition into parts that would correspond to our idea of the ‘building blocks’ of the data. Li et al. (2001) showed that when NMF was applied to a different facial image database, the representation was global rather than local, qualitatively different from that reported by Lee and Seung (1999). Again, we have rerun that experiment and confirm those results, see Figure 1b. The difference was mainly attributed to how well the images were... |

25 |
Non-negative matrix factorization for visual coding.
- Liu, Zheng, et al.
- 2003
(Show Context)
Citation Context ... given by NMF is somewhat of a side-effect rather than a goal, one cannot in any way control the degree to which the representation is sparse. In many applications, more direct control over the properties of the representation is needed. In this paper, we extend NMF to include the option to control sparseness explicitly. We show that this allows us to discover parts-based representations that are qualitatively better than those given c©2004 Patrik O. Hoyer. HOYER by basic NMF. We also discuss the relationship between our method and other recent extensions of NMF (Li et al., 2001; Hoyer, 2002; Liu et al., 2003). Additionally, this contribution includes a complete MATLAB package for performing NMF and its various extensions. Although the most basic version of NMF requires only two lines of code and certainly does not warrant distributing a separate software package, its several extensions involve more complicated operations; the absense of ready-made code has probably hindered their widespread use so far. We hope that our software package will alleviate the problem. This paper is structured as follows. In Section 2 we describe non-negative matrix factorization, and discuss its success but also its li... |

25 |
Localized versus distributed representations.
- Thorpe
- 1995
(Show Context)
Citation Context ...ations such as principal component analysis (PCA) and independent component analysis (ICA) (Hyvarinen et al., 2001). One of the most useful properties of NMF is that it usually produces a sparse representation of the data. Such a representation encodes much of the data using few ‘active’ components, which makes the encoding easy to interpret. Sparse coding (Field, 1994) has also, on theoretical grounds, been shown to be a useful middle ground between completely distributed representations, on the one hand, and unary representations (grandmother cells) on the other (Foldiak and Young, 1995; Thorpe, 1995). However, because the sparseness given by NMF is somewhat of a side-effect rather than a goal, one cannot in any way control the degree to which the representation is sparse. In many applications, more direct control over the properties of the representation is needed. In this paper, we extend NMF to include the option to control sparseness explicitly. We show that this allows us to discover parts-based representations that are qualitatively better than those given c©2004 Patrik O. Hoyer. HOYER by basic NMF. We also discuss the relationship between our method and other recent extensions of NM... |

24 | Modeling receptive fields with non-negative sparse coding.
- Hoyer
- 2003
(Show Context)
Citation Context ...ENESS CONSTRAINTS a c b Figure 1: NMF applied to various image data sets. (a) Basis images given by NMF applied to face image data from the CBCL database (http://cbcl.mit.edu/cbcl/software-datasets/FaceData2.html), following Lee and Seung (1999). In this case NMF produces a parts-based representation of the data. (b) Basis images derived from the ORL face image database (http://www.uk.research.att.com/facedatabase.html), following Li et al. (2001). Here, the NMF representation is global rather than parts-based. (c) Basis vectors from NMF applied to ON/OFF-contrast filtered natural image data (Hoyer, 2003). Top: Weights for the ON-channel. Each patch represents the part of one basis vector wi corresponding to the ON-channel. (White pixels denote zero weight, darker pixels are positive weights.) Middle: Corresponding weights for the OFF-channel. Bottom: Weights for ON minus weights for OFF. (Here, gray pixels denote zero.) Note that NMF represents this natural image data using circularly symmetric features. used is the squared error (euclidean distance) function E(W,H) = ‖V−WH‖2 = ∑ i, j (Vi j − (WH)i j)2. Although the minimization problem is convex in W and H separately, it is not convex in bot... |

13 | Blind separation of positive sources by globally convergent gradient search.
- Oja, Plumbley
- 2004
(Show Context)
Citation Context ...o ours, but with two important differences. First, the signs of the components are in general not restricted; in fact, symmetry is often assumed, implying an approximately equal number of positive and negative elements. Second, the sources are not forced to any desired degree of sparseness (as in our method) but rather sparseness is incorporated into the objective function to be optimized. The sparseness goal can be put on either W or H, or both (Stone et al., 2002). Recently, some authors have considered estimating the ICA model in the case of one-sided, non-negative sources (Plumbley, 2003; Oja and Plumbley, 2004). In these methods, non-negativity is not specified as a constraint but rather as an objective; hence, complete non-negativity of the representation is seldom achieved for real-life data sets. Nevertheless, one can show that if the linear ICA model holds, with non-negative components, these methods can identify the model. 6. Conclusions Non-negative matrix factorization (NMF) has proven itself a useful tool in the analysis of a diverse range of data. One of its most useful properties is that the resulting decompositions are often intuitive and easy to interpret because they are sparse. Sometim... |

11 | Independent Component Analysis. Wiley Interscience, - Hyvarinen, Karhunen, et al. - 2001 |

2 |
Automatic text extraction for content-based image indexing.
- Jung, Kim
- 2004
(Show Context)
Citation Context ...eaningful features from a diverse collection of real-life data sets. Lee and Seung (1999) showed that, when the data set consisted of a collection of face images, the representation consisted of 1459 HOYER basis vectors encoding for the mouth, nose, eyes, etc; the intuitive features of face images. In Figure 1a we have reproduced that basic result using the same data set. Additionally, they showed that meaningful topics can be learned when text documents are used as data. Subsequently, NMF has been successfully applied to a variety of data sets (Buchsbaum and Bloch, 2002; Brunet et al., 2004; Jung and Kim, 2004; Kim and Tidor, 2003). Despite this success, there also exist data sets for which NMF does not give an intuitive decomposition into parts that would correspond to our idea of the ‘building blocks’ of the data. Li et al. (2001) showed that when NMF was applied to a different facial image database, the representation was global rather than local, qualitatively different from that reported by Lee and Seung (1999). Again, we have rerun that experiment and confirm those results, see Figure 1b. The difference was mainly attributed to how well the images were hand-aligned (Li et al., 2001). Another ... |