#### DMCA

## Adaptive dimension reduction using discriminant analysis and k-means clustering (2007)

### Cached

### Download Links

Venue: | In ICML |

Citations: | 55 - 7 self |

### Citations

3374 |
The elements of statistical learning
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ... subspace obtained in linear discriminant analysis (LDA) is perhaps the best subspace to do data clustering, because in LDA subspace, clusters are well separated. LDA is a very well developed theory (=-=Hastie et al., 2001-=-), and is getting renewed interest (De la Torre & Kanade, 2006; Ye & Xiong, 2006; Park & Howland, 2004) with the growth of matrix-based approaches in machine learning. In (De la Torre & Kanade, 2006),... |

3316 |
Principal Component Analysis
- Jolliffe
- 2002
(Show Context)
Citation Context ...ions (Beyer et al., 1999). There are many approaches to address this problem. The simplest approach is dimension reduction techniques, including principal component analysis (PCA) (Duda et al., 2000; =-=Jolliffe, 2002-=-) and random projections (Dasgupta, 2000). In these methods, dimension reduction is carried out as a preprocessing step and is decoupled from the clustering process: once the subspace dimensions are s... |

1213 | Algorithms for non-negative matrix factorization
- Lee, Seung
(Show Context)
Citation Context ...rimental results. On the text datasets, we compare our subspace clustering algorithm with the following algorithms: (i) standard Kmeans algorithm; (ii) Non-negative Matrix Factorization (NMF) method (=-=Lee & Seung, 2001-=-); (iii) Tri-Factorization Method (Ding et al., 2006); and (iv) Adaptive Subspace Clustering (ASI). The results are shown in Figure 1. Note that Tri-Factorization method is based on the decomposition ... |

581 | Biclustering of expression data
- CHENG, CHURCH
- 2000
(Show Context)
Citation Context ...d or well defined in some way. This is different from co-clustering (simultaneously clustering the features and data points) (Dhillon, 2001; Zha et al., 2001; Banerjee et al., 2004) and biclustering (=-=Cheng & Church, 2000-=-) (which essentially find blocks in a rectangle data matrix). If we restrict the subspace to be linear combinations of original features, the subspace obtained in linear discriminant analysis (LDA) is... |

458 | Co-clustering documents and words using bipartite spectral graph partitioning
- Dhillon
- 2001
(Show Context)
Citation Context ...eduction attempt to find the subspace where clusters are most wellseparated or well defined in some way. This is different from co-clustering (simultaneously clustering the features and data points) (=-=Dhillon, 2001-=-; Zha et al., 2001; Banerjee et al., 2004) and biclustering (Cheng & Church, 2000) (which essentially find blocks in a rectangle data matrix). If we restrict the subspace to be linear combinations of ... |

397 | When is ”nearest neighbor” meaningful
- Beyer, Goldstein, et al.
- 1999
(Show Context)
Citation Context ...se of dimensionality. It has been shown that in a high dimensional space, the distance between every pair of points is almost the same for a wide variety of data distributions and distance functions (=-=Beyer et al., 1999-=-). There are many approaches to address this problem. The simplest approach is dimension reduction techniques, including principal component analysis (PCA) (Duda et al., 2000; Jolliffe, 2002) and rand... |

286 |
Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/∼mccallum/bow
- McCallum
- 1996
(Show Context)
Citation Context ... The documents are represented as the term vectors using vector space model. These document datasets are preprocessed (removing the stop words and unnecessary tags and headers) using rainbow package (=-=McCallum, 1996-=-). – CSTR is the dataset of the abstracts of technical reports (TRs) published in the Department of Computer Science at a research university between 1991 and 2002. The dataset contains 476 abstracts,... |

227 | Subspace Clustering for High Dimensional Data: A Review
- Parsons, Haque, et al.
(Show Context)
Citation Context ...n approach (Ding et al., 2002; Li et al., 2004) where the subspace is adaptively adjusted and integrated with the clustering process. A different approach is called subspace clustering (see a survey (=-=Parsons et al., 2004-=-)) where the focus is on selecting a small number of original dimensions (features) in some unsupervised way so that clusters become more obvious in this subspace. Focusing on the original features (d... |

194 | Spectral Relaxation for k-Means Clustering
- Zha, He, et al.
- 2001
(Show Context)
Citation Context ...eparated, K-means clustering is a good model of the data distribution; PCA is the right subspace for clustering due to the equivalence between the relaxed K-means clustering and PCA (Ding & He, 2004; =-=Zha et al., 2002-=-). LDA-Km deals with the data distributions which deviate from this situation. 3.1. Extension to Nonlinear Case Using Kernels The basis idea of LDA is to transform data into a new space/subspace where... |

153 | On the equivalence of nonnegative matrix factorization and spectral clustering
- Ding, He, et al.
- 2005
(Show Context)
Citation Context ...Y = U T X,C,H are nonnegative. Then Y ≈ CH T is a nonnegative matrix factorization (NMF). which is obtained by the optimization min C,H ||Y −CHT || 2 , s.t. C ≥ 0, H ≥ 0,H T H = I, (12) By a theorem (=-=Ding et al., 2005-=-), the NMF of Eq.(12) is equivalent to a relaxed K-means clustering (Ding & He, 2004; Zha et al., 2002), the NMF of C = (c1,··· ,cK) contains the cluster centroids, and H are cluster indicator. In fac... |

120 |
Experiments with random projection
- Dasgupta
- 2000
(Show Context)
Citation Context ...y approaches to address this problem. The simplest approach is dimension reduction techniques, including principal component analysis (PCA) (Duda et al., 2000; Jolliffe, 2002) and random projections (=-=Dasgupta, 2000-=-). In these methods, dimension reduction is carried out as a preprocessing step and is decoupled from the clustering process: once the subspace dimensions are selected, they stay fixed during the clus... |

114 | Orthogonal nonnegative matrix tri-factorizations for clustering
- Ding, Li, et al.
(Show Context)
Citation Context ...ur subspace clustering algorithm with the following algorithms: (i) standard Kmeans algorithm; (ii) Non-negative Matrix Factorization (NMF) method (Lee & Seung, 2001); (iii) Tri-Factorization Method (=-=Ding et al., 2006-=-); and (iv) Adaptive Subspace Clustering (ASI). The results are shown in Figure 1. Note that Tri-Factorization method is based on the decomposition X ≈ FStG T where the orthogonality of F T F = I,G G ... |

82 | WebACE: A web agent for document categorization and exploartion
- Han, Boley, et al.
- 1998
(Show Context)
Citation Context ...ng the 10 most frequent categories among the 135 topics. – The WebACE dataset contains 2340 documents consisting of news articles from 20 different topics in October 1997 collected in WebACE project (=-=Han et al., 1998-=-). – The WebKB dataset contains webpages gathered from university computer science departments. There are about 8280 documents and they are divided into 7 categories: student, faculty, staff, course, ... |

67 | Adaptive dimension reduction for clustering high dimensional data
- Ding, He, et al.
- 2002
(Show Context)
Citation Context ...s of the 24 th International Conference on Machine Learning, Corvallis, OR, 2007. Copyright 2007 by the author(s)/owner(s). An extension of this approach is the adaptive dimension reduction approach (=-=Ding et al., 2002-=-; Li et al., 2004) where the subspace is adaptively adjusted and integrated with the clustering process. A different approach is called subspace clustering (see a survey (Parsons et al., 2004)) where ... |

58 | Generalizing discriminant analysis using the generalized singular value decomposition
- Howland, Park
(Show Context)
Citation Context ...tering, because in LDA subspace, clusters are well separated. LDA is a very well developed theory (Hastie et al., 2001), and is getting renewed interest (De la Torre & Kanade, 2006; Ye & Xiong, 2006; =-=Park & Howland, 2004-=-) with the growth of matrix-based approaches in machine learning. In (De la Torre & Kanade, 2006), a matrix factorization is proposed that, after one matrix factor is eliminated, the two remaining mat... |

38 | Discriminative Cluster Analysis - Torre, Kanade - 2006 |

36 | Document clustering via adaptive subspace iteration
- Li, Ma, et al.
- 2004
(Show Context)
Citation Context ...rnational Conference on Machine Learning, Corvallis, OR, 2007. Copyright 2007 by the author(s)/owner(s). An extension of this approach is the adaptive dimension reduction approach (Ding et al., 2002; =-=Li et al., 2004-=-) where the subspace is adaptively adjusted and integrated with the clustering process. A different approach is called subspace clustering (see a survey (Parsons et al., 2004)) where the focus is on s... |

19 |
K-means clustering and principal component analysis
- Ding, He
(Show Context)
Citation Context ...ussians or well separated, K-means clustering is a good model of the data distribution; PCA is the right subspace for clustering due to the equivalence between the relaxed K-means clustering and PCA (=-=Ding & He, 2004-=-; Zha et al., 2002). LDA-Km deals with the data distributions which deviate from this situation. 3.1. Extension to Nonlinear Case Using Kernels The basis idea of LDA is to transform data into a new sp... |

10 | Null space versus orthogonal linear discriminant analysis
- Ye, Xiong
- 2006
(Show Context)
Citation Context ...ce to do data clustering, because in LDA subspace, clusters are well separated. LDA is a very well developed theory (Hastie et al., 2001), and is getting renewed interest (De la Torre & Kanade, 2006; =-=Ye & Xiong, 2006-=-; Park & Howland, 2004) with the growth of matrix-based approaches in machine learning. In (De la Torre & Kanade, 2006), a matrix factorization is proposed that, after one matrix factor is eliminated,... |

9 | S (2004) IFD: iterative feature and data clustering
- Li, Ma
(Show Context)
Citation Context ...the initial study, U,H are restricted to {0,1}, and C is always set to C = argmin ||U C T X −CH T || 2 = U T XH(H T H) −1 . H,U are solved by an Iterative Feature and Data (IFD) clustering algorithm (=-=Li & Ma, 2004-=-). The ASI factorization is interesting for several reasons. First, assuming Y = U T X,C,H are nonnegative. Then Y ≈ CH T is a nonnegative matrix factorization (NMF). which is obtained by the optimiza... |