Results 1 
5 of
5
Bayesian Hierarchical CrossClustering
"... Most clustering algorithms assume that all dimensions of the data can be described by a single structure. Crossclustering (or multiview clustering) allows multiple structures, each applying to a subset of the dimensions. We present a novel approach to crossclustering, based on approximating the sol ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Most clustering algorithms assume that all dimensions of the data can be described by a single structure. Crossclustering (or multiview clustering) allows multiple structures, each applying to a subset of the dimensions. We present a novel approach to crossclustering, based on approximating the solution to a Cross Dirichlet Process mixture (CDPM) model [Shafto et al., 2006, Mansinghka et al., 2009]. Our bottomup, deterministic approach results in a hierarchical clustering of dimensions, and at each node, a hierarchical clustering of data points. We also present a randomized approximation, based on a truncated hierarchy, that scales linearly in the number of levels. Results on synthetic and realworld data sets demonstrate that the crossclustering based algorithms perform as well or better than the clustering based algorithms, our deterministic approaches models perform as well as the MCMCbased CDPM, and the randomized approximation provides a remarkable speedup relative to the full deterministic approximation with minimal cost in predictive error. 1
A nonparametric bayesian model for multiple clustering with overlapping feature views
 Journal of Machine Learning Research
, 2012
"... Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multifaceted and can be grouped and interpreted in many different ways. Moreover, for highdimensional data, different features may be relevant or irrelevant to each clustering solution, ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multifaceted and can be grouped and interpreted in many different ways. Moreover, for highdimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data. In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view. 1
Factorial Clustering with an Application to Plant Distribution Data
"... Abstract. We propose a latent variable approach for multiple clustering of categorical data. We use logistic regression models for the conditional distribution of observable features given the latent cluster variables. This model supports an interpretation of the different clusterings as representin ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We propose a latent variable approach for multiple clustering of categorical data. We use logistic regression models for the conditional distribution of observable features given the latent cluster variables. This model supports an interpretation of the different clusterings as representing distinct, independent factors that determine the distribution of the observed features. We apply the model for the analysis of plant distribution data, where multiple clusterings are of interest to determine the major underlying factors that determine the vegetation in a geographical region. 1
Iterative Discovery of Multiple Alternative Clustering Views
"... Abstract—Complex data can be grouped and interpreted in many different ways. Most existing clustering algorithms, however, only find one clustering solution, and provide little guidance to data analysts who may not be satisfied with that single clustering and may wish to explore alternatives. We int ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Complex data can be grouped and interpreted in many different ways. Most existing clustering algorithms, however, only find one clustering solution, and provide little guidance to data analysts who may not be satisfied with that single clustering and may wish to explore alternatives. We introduce a novel approach that provides several clustering solutions to the user for the purposes of exploratory data analysis. Our approach additionally captures the notion that alternative clusterings may reside in different subspaces (or views). We present an algorithm that simultaneously finds these subspaces and the corresponding clusterings. The algorithm is based on an optimization procedure that incorporates terms for cluster quality and novelty relative to previously discovered clustering solutions. We present a range of experiments that compare our approach to alternatives and explore the connections between simultaneous and iterative modes of discovery of multiple clusterings. Index Terms—Kernel methods, nonredundant clustering, alternative clustering, multiple clustering, dimensionality reduction 1
Fast Multidimensional Clustering of Categorical Data
"... Abstract. Early research work on clustering usually assumed that there was one true clustering of data. However, complex data are typically multifaceted and can be meaningfully clustered in many different ways. There is a growing interest in methods that produce multiple partitions of data. One such ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Early research work on clustering usually assumed that there was one true clustering of data. However, complex data are typically multifaceted and can be meaningfully clustered in many different ways. There is a growing interest in methods that produce multiple partitions of data. One such method is based on latent tree models (LTMs). This method has a number of advantages over alternative methods, but is computationally inefficient. We propose a fast algorithm for learning LTMs and show that the algorithm can produce rich and meaningful clustering results in moderately large data sets.