#### DMCA

## A unified view on clustering binary data

### Cached

### Download Links

Venue: | Machine Learning |

Citations: | 12 - 3 self |

### Citations

12192 |
Elements of Information Theory.
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...nships among entropy, mixture models as well as minimum description length have been experimentally studied and evaluated in the machine learning literature (Mitchell, 1997; Celeux & Soromenho, 1996; =-=Cover & Thomas, 1991-=-). Here we conduct experiments to compare the following criteria: entropy, dissimilarity coefficient, and the matrix perspective. To minimize the entropy criterion defined in Equation 1, we use the op... |

3540 | Fast algorithms for mining association rules.
- Agrawal, Srikant
- 1994
(Show Context)
Citation Context ...cument clustering. For market basket data, each data transaction can be represented as a binary vector where each element indicates whether or not any of the corresponding item/product was purchased (=-=Agrawal & Srikant, 1994-=-). For document clustering, each document can be represented as a binary vector where each element indicates whether a given word/term was present or not (Li et al., 2004a; Li, 2005). Generally cluste... |

2760 |
Algorithms for Clustering Data,
- Jain, Dubes
- 1998
(Show Context)
Citation Context ... in Equation 1, we use the optimization procedure introduced in (Li et al., 2004b). For minimizing the dissimilarity coefficient criterion defined in Equation 4, we use the popular K-means algorithm (=-=Jain & Dubes, 1988-=-). For the matrix perspective, we use the clustering method described in (Li & Ma, 2004). 4.2 Datasets We perform our experiments on document datasets. In our experiments, documents are represented us... |

2210 |
A k-means clustering algorithm,”
- Hartigan, Wong
- 1979
(Show Context)
Citation Context ... points in a multi-dimensional space into classes (called clusters) so that (i) the points belonging to the same class are similar and (ii) 1sthe points belonging to different classes are dissimilar (=-=Hartigan, 1975-=-; Kaufman & Rousseeuw, 1990). In this paper, we focus our attention on binary datasets. Binary data have been occupying a special place in the domain of data analysis. Typical applications for binary ... |

2163 |
Finding Groups in Data: An Introduction to Cluster Analysis,
- Kaufman, Rousseeuw
- 1990
(Show Context)
Citation Context ...ti-dimensional space into classes (called clusters) so that (i) the points belonging to the same class are similar and (ii) 1sthe points belonging to different classes are dissimilar (Hartigan, 1975; =-=Kaufman & Rousseeuw, 1990-=-). In this paper, we focus our attention on binary datasets. Binary data have been occupying a special place in the domain of data analysis. Typical applications for binary data clustering include mar... |

1693 |
Finite Mixture Models.
- McLachlan, Peel
- 2000
(Show Context)
Citation Context ... can be formally derived using likelihood principle based on Bernoulli mixture models. The basic idea of the mixture model is that the observed data are generated by several different latent classes (=-=McLachlan & Peel, 2000-=-). In our setting, the observed data, characterized by the {0, 1} p valued data vectors, can be viewed as a mixture of multivariate Bernoulli distributions. In general, there will be many data points:... |

1524 |
Modeling by shortest data description.
- Rissanen
- 1978
(Show Context)
Citation Context ...rator of their dissimilarity coefficients. 3.5 Minimum Description Length Minimum Description length (MDL) aims at searching for a model that provides the most compact encoding for data transmission (=-=Rissanen, 1978-=-) and is conceptually similar to minimum message length (MML) (Oliver & Baxter, 1994; Baxter & Oliver, 1994) and stochastic complexity minimization (Rissanen, 1989). The MDL approach can be viewed in ... |

430 | ROCK: a robust clustering algorithm for categorical attributes. In:
- Guha, Rastogi, et al.
- 1999
(Show Context)
Citation Context ...tly coupled. Various methods/criteria have been proposed over the years from various perspectives and with various focuses (Barbara et al., 2002; Gibson et al., 1998; Huang, 1998; Ganti et al., 1999; =-=Guha et al., 2000-=-; Gyllenberg et al., 1997; Li et al., 2004b). However, few attempts have been made to establish the connections between them while highlighting their differences. In this paper, we aim to provide a un... |

398 | Concept decompositions for large sparse text data using clustering.
- Dhillon, Modha
- 2001
(Show Context)
Citation Context ... them aim at minimizing the expected entropy over the partition. Note that ai can be viewed as a “center” for the cluster Ci. 3.4 A Matrix Perspective Recently, a number of authors (Ando & Lee, 2001; =-=Dhillon & Modha, 2001-=-; Li et al., 2004a; Soete & douglas Carroll, 1994; Xu & Gong, 2004; Xu et al., 2003; Zha et al., 2001; Dhillon et al., 2003) have suggested clustering methods based on matrix computations and have dem... |

357 |
Stochastic Complexity in Statistical Inquiry,
- Rissanen
- 1989
(Show Context)
Citation Context ...pact encoding for data transmission (Rissanen, 1978) and is conceptually similar to minimum message length (MML) (Oliver & Baxter, 1994; Baxter & Oliver, 1994) and stochastic complexity minimization (=-=Rissanen, 1989-=-). The MDL approach can be viewed in the Bayesian perspective (Mumford, 1996; Mitchell, 1997): the code lengths and the code structure in the coding model are equivalent to the negative log probabilit... |

343 | Information-theoretic co-clustering
- Dhillon, Mallela, et al.
- 2003
(Show Context)
Citation Context .... 3.4 A Matrix Perspective Recently, a number of authors (Ando & Lee, 2001; Dhillon & Modha, 2001; Li et al., 2004a; Soete & douglas Carroll, 1994; Xu & Gong, 2004; Xu et al., 2003; Zha et al., 2001; =-=Dhillon et al., 2003-=-) have suggested clustering methods based on matrix computations and have demonstrated good performance on various datasets. These methods are attractive as they utilize many existing numerical algori... |

319 |
Document clustering based on non-negative matrix factorization.
- Xu, Liu, et al.
- 2003
(Show Context)
Citation Context ...ed as a “center” for the cluster Ci. 3.4 A Matrix Perspective Recently, a number of authors (Ando & Lee, 2001; Dhillon & Modha, 2001; Li et al., 2004a; Soete & douglas Carroll, 1994; Xu & Gong, 2004; =-=Xu et al., 2003-=-; Zha et al., 2001; Dhillon et al., 2003) have suggested clustering methods based on matrix computations and have demonstrated good performance on various datasets. These methods are attractive as the... |

286 |
Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/∼mccallum/bow
- McCallum
- 1996
(Show Context)
Citation Context ...anization of the posted article are ignored. In all our experiments, we first select the top 200 words by mutual information with class labels. The feature selection is done with the rainbow package (=-=McCallum, 1996-=-). All the datasets we use are standard labelled corpus and we can use the labels of the dataset as the objective knowledge to evaluate clustering. Since the goal of the experiments is to empirically ... |

253 | Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery,
- Huang
- 1998
(Show Context)
Citation Context ...lem, the four components are tightly coupled. Various methods/criteria have been proposed over the years from various perspectives and with various focuses (Barbara et al., 2002; Gibson et al., 1998; =-=Huang, 1998-=-; Ganti et al., 1999; Guha et al., 2000; Gyllenberg et al., 1997; Li et al., 2004b). However, few attempts have been made to establish the connections between them while highlighting their differences... |

193 | Spectral relaxation for k-means clustering
- Zha, Ding, et al.
- 2001
(Show Context)
Citation Context ...for the cluster Ci. 3.4 A Matrix Perspective Recently, a number of authors (Ando & Lee, 2001; Dhillon & Modha, 2001; Li et al., 2004a; Soete & douglas Carroll, 1994; Xu & Gong, 2004; Xu et al., 2003; =-=Zha et al., 2001-=-; Dhillon et al., 2003) have suggested clustering methods based on matrix computations and have demonstrated good performance on various datasets. These methods are attractive as they utilize many exi... |

176 | Clustering categorical data: an approach based on dynamic systems. In:
- Gibson, Kleiberg, et al.
- 1998
(Show Context)
Citation Context ... data clustering problem, the four components are tightly coupled. Various methods/criteria have been proposed over the years from various perspectives and with various focuses (Barbara et al., 2002; =-=Gibson et al., 1998-=-; Huang, 1998; Ganti et al., 1999; Guha et al., 2000; Gyllenberg et al., 1997; Li et al., 2004b). However, few attempts have been made to establish the connections between them while highlighting thei... |

141 | Quantification method of classification processes: concept of structural alpha-entropy.
- Havrda, Charvat
- 1967
(Show Context)
Citation Context ...=1 nk xt1 ,xt2∈Ck K� p� 1 k=1 j=1 nk K� p� nkp k=1 j=1 j k nkp j k nk(1 − p j k ) where p j k is the probability that the j-th attribute is 1 in cluster k. − xt2j | (1 − pjk ) (5) Havrda and Charvat (=-=Havrda & Charvat, 1967-=-) proposed a generalized entropy of degree s for a discrete probability distribution P = (p1, p2, · · · , pn) and H s (P ) = (2 (1−s) − 1) −1 n� ( p i=1 s i − 1), s > 0, s �= 1 lim s→1 Hs n� (P ) = − ... |

114 |
Mathematical taxonomy.
- Jardine, Sibson
- 1971
(Show Context)
Citation Context ... (Baulieu, 1997). Dissimilarity measures can be transformed into a αa+b+c+δd similarity function by simple transformations such as adding 1 and inverting, dividing by 2 and 7ssubtracting from 1 etc. (=-=Jardine & Sibson, 1971-=-). If the joint absence of the attribute is ignored 2 , i.e., setting δ = 0, then the binary dissimilarity measure can be generally written as D(a, b, c, d) = b+c , α > 0. Table 2 listed several commo... |

110 |
An entropy criterion for assessing the number of clusters in a mixture model
- Celeux, Soromenho
- 1996
(Show Context)
Citation Context ...e. 4.1 Methods The relationships among entropy, mixture models as well as minimum description length have been experimentally studied and evaluated in the machine learning literature (Mitchell, 1997; =-=Celeux & Soromenho, 1996-=-; Cover & Thomas, 1991). Here we conduct experiments to compare the following criteria: entropy, dissimilarity coefficient, and the matrix perspective. To minimize the entropy criterion defined in Equ... |

105 | CACTUS: Clustering Categorical Data Using Summaries
- Ganti, Gehrke, et al.
- 1999
(Show Context)
Citation Context ... components are tightly coupled. Various methods/criteria have been proposed over the years from various perspectives and with various focuses (Barbara et al., 2002; Gibson et al., 1998; Huang, 1998; =-=Ganti et al., 1999-=-; Guha et al., 2000; Gyllenberg et al., 1997; Li et al., 2004b). However, few attempts have been made to establish the connections between them while highlighting their differences. In this paper, we ... |

89 | COOLCAT: an entropy-based algorithm for categorical clustering. In:
- Barbara, Li, et al.
- 2002
(Show Context)
Citation Context ...procedure. For a given data clustering problem, the four components are tightly coupled. Various methods/criteria have been proposed over the years from various perspectives and with various focuses (=-=Barbara et al., 2002-=-; Gibson et al., 1998; Huang, 1998; Ganti et al., 1999; Guha et al., 2000; Gyllenberg et al., 1997; Li et al., 2004b). However, few attempts have been made to establish the connections between them wh... |

67 | Clustering using Monte Carlo crossvalidation
- Smyth
- 1996
(Show Context)
Citation Context ...ing the entropy criterion: to look at various techniques used in model-based approaches such as likelihood ratio tests, penalty methods, Bayesian methods, cross-validation (Biernacki & Govaert, 1997; =-=Smyth, 1996-=-). Moreover, the connections motivate us to explore the integration of various clustering methods. 20sAcknowledgment The author is grateful to Dr. Shenghuo Zhu, Dr. Sheng Ma and Dr. Mitsunori Ogihara ... |

51 | Pattern theory: a unifying perspective
- Mumford
- 1996
(Show Context)
Citation Context ...lar to minimum message length (MML) (Oliver & Baxter, 1994; Baxter & Oliver, 1994) and stochastic complexity minimization (Rissanen, 1989). The MDL approach can be viewed in the Bayesian perspective (=-=Mumford, 1996-=-; Mitchell, 1997): the code lengths and the code structure in the coding model are equivalent to the negative log probabilities and probability structure assumptions in the Bayesian approach. As descr... |

50 |
Clustering criteria and multivariate normal mixtures.
- Symons
- 1981
(Show Context)
Citation Context ...hich indicate the origin/generation of the points: u i t is equal to 1 or 0 accordingly as xt comes from the cluster Ci or not. These vectors are the missing variables. The classification likelihood (=-=Symons, 1981-=-) is then: Note that CL(a, u) = = n� K� u t=1 i=1 i t log p(xt|ai) n� K� u t=1 i=1 i t log p� [a j=1 j CL(a, u) = L(a) − LP (a, u) 11 i ] xtj j [1 − ai] (1−xtj) (7)swhere LP (a, u) = − n� K� u t=1 i=1... |

48 |
A general model for clustering binary data.
- Li
- 2005
(Show Context)
Citation Context ...ed (Agrawal & Srikant, 1994). For document clustering, each document can be represented as a binary vector where each element indicates whether a given word/term was present or not (Li et al., 2004a; =-=Li, 2005-=-). Generally clustering problems are determined by four basic components: a) the (physical) representation of the given data set; b) the distance/dissimilarity measures between data points; c) the cri... |

37 |
Document clustering by concept factorization
- Xu, Gong
- 2004
(Show Context)
Citation Context ...at ai can be viewed as a “center” for the cluster Ci. 3.4 A Matrix Perspective Recently, a number of authors (Ando & Lee, 2001; Dhillon & Modha, 2001; Li et al., 2004a; Soete & douglas Carroll, 1994; =-=Xu & Gong, 2004-=-; Xu et al., 2003; Zha et al., 2001; Dhillon et al., 2003) have suggested clustering methods based on matrix computations and have demonstrated good performance on various datasets. These methods are ... |

36 | Document clustering via adaptive subspace iteration
- Li, Ma, et al.
- 2004
(Show Context)
Citation Context ...roduct was purchased (Agrawal & Srikant, 1994). For document clustering, each document can be represented as a binary vector where each element indicates whether a given word/term was present or not (=-=Li et al., 2004-=-a; Li, 2005). Generally clustering problems are determined by four basic components: a) the (physical) representation of the given data set; b) the distance/dissimilarity measures between data points;... |

33 | M.: Entropy-based criterion in categorical clustering
- Li, Ma, et al.
- 2004
(Show Context)
Citation Context ...roduct was purchased (Agrawal & Srikant, 1994). For document clustering, each document can be represented as a binary vector where each element indicates whether a given word/term was present or not (=-=Li et al., 2004-=-a; Li, 2005). Generally clustering problems are determined by four basic components: a) the (physical) representation of the given data set; b) the distance/dissimilarity measures between data points;... |

29 |
Clustering criteria for discrete data and latent class models
- Celeux, Govaert
- 1991
(Show Context)
Citation Context ...hat the entropy-based clustering criteria can be formally derived in the formal framework of probabilistic clustering models. 4s3.1.1 Entropy Criterion The classical clustering criterion (Bock, 1989; =-=Celeux & Govaert, 1991-=-) is to find the partition C such that O(C) = = K� = 1 N p� 1� k=1 j=1 t=0 K� p� 1� N jt k N N jt k N NNjt k log NkN jt jt Nk (log nk k=1 j=1 t=0 K� p� 1� N nk k=1 j=1 t=0 jt k nk K� = 1 p ( ˆ H(X) − ... |

26 |
MML and bayesianism: Similarities and differences
- Oliver, Baxter
- 1995
(Show Context)
Citation Context ...imum Description length (MDL) aims at searching for a model that provides the most compact encoding for data transmission (Rissanen, 1978) and is conceptually similar to minimum message length (MML) (=-=Oliver & Baxter, 1994-=-; Baxter & Oliver, 1994) and stochastic complexity minimization (Rissanen, 1989). The MDL approach can be viewed in the Bayesian perspective (Mumford, 1996; Mitchell, 1997): the code lengths and the c... |

25 | Iterative residual rescaling: An analysis and generalization of LSI
- Ando, Lee
- 2001
(Show Context)
Citation Context ...on 1 since both of them aim at minimizing the expected entropy over the partition. Note that ai can be viewed as a “center” for the cluster Ci. 3.4 A Matrix Perspective Recently, a number of authors (=-=Ando & Lee, 2001-=-; Dhillon & Modha, 2001; Li et al., 2004a; Soete & douglas Carroll, 1994; Xu & Gong, 2004; Xu et al., 2003; Zha et al., 2001; Dhillon et al., 2003) have suggested clustering methods based on matrix co... |

21 |
Using the classification likelihood to choose the number of clusters
- Biernacki, Govaert
- 1997
(Show Context)
Citation Context ... number of clusters when using the entropy criterion: to look at various techniques used in model-based approaches such as likelihood ratio tests, penalty methods, Bayesian methods, cross-validation (=-=Biernacki & Govaert, 1997-=-; Smyth, 1996). Moreover, the connections motivate us to explore the integration of various clustering methods. 20sAcknowledgment The author is grateful to Dr. Shenghuo Zhu, Dr. Sheng Ma and Dr. Mitsu... |

20 |
Classification of binary vectors by stochastic complexity
- Gyllenberg, Koski, et al.
- 1997
(Show Context)
Citation Context ...s methods/criteria have been proposed over the years from various perspectives and with various focuses (Barbara et al., 2002; Gibson et al., 1998; Huang, 1998; Ganti et al., 1999; Guha et al., 2000; =-=Gyllenberg et al., 1997-=-; Li et al., 2004b). However, few attempts have been made to establish the connections between them while highlighting their differences. In this paper, we aim to provide a unified view of binary data... |

17 | Maximum certainty data partitioning
- Roberts, Everson, et al.
- 2000
(Show Context)
Citation Context ...nerated by a number of classes. We first model the unconditional probability density function and then seek a number of partitions whose combination yields the density function (Roberts et al., 1999; =-=Roberts et al., 2000-=-). The K-L measure then tries to measure the difference between the unconditional density and the partitional density. 1 We adopt the convention that 0log0 = 0 if necessary. 5sGiven two distributions ... |

16 |
Probabilistic aspects in cluster analysis. Conceptual and Numerical Analysis of Data
- Bock
- 1989
(Show Context)
Citation Context ... Entropy Criterion As measures for uncertainty presented in random variables, entropy-type criterion for the heterogeneity of object clusters have been used since the early times of cluster analysis (=-=Bock, 1989-=-). In this section, we first study the entropy-based criteria in categorical clustering. In particular, we will show that the entropy-based clustering criteria can be formally derived in the formal fr... |

12 |
Two variant axiom systems for presence/absence based dissimilarity coefficients
- Baulieu
- 1997
(Show Context)
Citation Context ...Let a set X of n data points and a set A of p binary attributes be given. Given two data points x1 and x2, there are four fundamental quantities that can be used to define similarity between the two (=-=Baulieu, 1997-=-): • a = card(x1j = x2j = 1), • b = card(x1j = 1&x2j = 0), • c = card(x1j = 0&x2j = 1), • d = card(x1j = x2j = 0), where j = 1, · · · , p and card represents cardinality. The presence/absence based di... |

11 | Efficient multi-way text categorization via generalized discriminant analysis
- Li, Zhu, et al.
- 2003
(Show Context)
Citation Context ...ical reports published in the Department of Computer Science at the University of Rochester between 1991 and 2002. The TRs are available at http://www.cs.rochester.edu/trs. It has been first used in (=-=Li et al., 2003-=-) for text categorization. The dataset contained 476 abstracts, which were divided into four research areas: Natural Language Processing(NLP), Robotics/Vision, Systems, and Theory. • WebKB: The WebKB ... |

9 | Minimum entropy data partitioning
- Roberts, Everson, et al.
- 1999
(Show Context)
Citation Context ...observed dataset is generated by a number of classes. We first model the unconditional probability density function and then seek a number of partitions whose combination yields the density function (=-=Roberts et al., 1999-=-; Roberts et al., 2000). The K-L measure then tries to measure the difference between the unconditional density and the partitional density. 1 We adopt the convention that 0log0 = 0 if necessary. 5sGi... |

5 |
IFD:iterative feature and data clustering
- Li, Ma
- 2004
(Show Context)
Citation Context ...y existing numerical algorithms in matrix computations. In our following discussions, we use the cluster model for binary data clustering based on a matrix perspective presented in (Li et al., 2004a; =-=Li & Ma, 2004-=-). In the cluster model, the problem of clustering is formulated as matrix approximations and the clustering objective is minimizing the approximation error between the original data matrix and the re... |

5 | K-means Clustering an a Low-dimensional Euclidean Space - SOETE, CARROLL - 1994 |