Results 1  10
of
13
A simple example of Dirichlet process mixture inconsistency for the number of components
 In Advances in Neural Information Processing Systems
"... For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures (DPMs) not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on t ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures (DPMs) not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on the number of clusters — that is, the posterior on the number of components represented in the observed data. However, it turns out that this posterior is not consistent — it does not concentrate at the true number of components. In this note, we give an elementary proof of this inconsistency in what is perhaps the simplest possible setting: a DPM with normal components of unit variance, applied to data from a “mixture” with one standard normal component. Further, we show that this example exhibits severe inconsistency: instead of going to 1, the posterior probability that there is one cluster converges (in probability) to 0. 1
The Translationinvariant WishartDirichlet Process for Clustering Distance Data
"... We present a probabilistic model for clustering of objects represented via pairwise dissimilarities. We propose that even if an underlying vectorial representation exists, it is better to work directly with the dissimilarity matrix hence avoiding unnecessary bias and variance caused by embeddings. B ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We present a probabilistic model for clustering of objects represented via pairwise dissimilarities. We propose that even if an underlying vectorial representation exists, it is better to work directly with the dissimilarity matrix hence avoiding unnecessary bias and variance caused by embeddings. By using a Dirichlet process prior we are not obliged to fix the number of clusters in advance. Furthermore, our clustering model is permutation, scale and translationinvariant, and it is called the Translationinvariant Wishart Dirichlet (TIWD) process. A highly efficient MCMC sampling algorithm is presented. Experiments show that the TIWD process exhibits several advantages over competing approaches. 1.
Mixture models with a prior on the number of components. arXiv:1502.06241
, 2015
"... A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with symmetric Dirichlet weights, and put a prior on the number of components—that is, to use a mixture of finite mixtures (MFM). The most commonlyused method of inference ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with symmetric Dirichlet weights, and put a prior on the number of components—that is, to use a mixture of finite mixtures (MFM). The most commonlyused method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in highdimensional spaces. Meanwhile, there are samplers for Dirichlet process mixture (DPM) models that are relatively simple and are easily adapted to new applications. It turns out that, in fact, many of the essential properties of DPMs are also exhibited by MFMs—an exchangeable partition distribution, restaurant process, random measure representation, and stickbreaking representation—and crucially, the MFM analogues are simple enough that they can be used much like the corresponding DPM properties. Consequently, many of the powerful methods developed for inference in DPMs can be directly applied to MFMs as well; this simplifies the implementation of MFMs and can substantially improve mixing. We illustrate with real and simulated data, including highdimensional gene expression data used to discriminate cancer subtypes.
CDP Mixture Models for Data Clustering
"... Abstract—In Dirichlet process (DP) mixture models, the number of components is implicitly determined by the sampling parameters of Dirichlet process. However, this kind of models usually produces lots of small mixture components when modeling realworld data, especially highdimensional data. In thi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In Dirichlet process (DP) mixture models, the number of components is implicitly determined by the sampling parameters of Dirichlet process. However, this kind of models usually produces lots of small mixture components when modeling realworld data, especially highdimensional data. In this paper, we propose a new class of Dirichlet process mixture models with some constrained principles, named constrained Dirichlet process (CDP) mixture models. Based on general DP mixture models, we add a resampling step to obtain latent parameters. In this way, CDP mixture models can suppress noise and generate the compact patterns of the data. Experimental results on data clustering show the remarkable performance of the CDP mixture models.
Testing for the existence of clusters
"... Detecting and determining clusters present in a certain sample has been an important concern, among researchers from different fields, for a long time. In particular, assessing whether the clusters are statistically significant, is a question that has been asked by a number of experimenters. Recentl ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Detecting and determining clusters present in a certain sample has been an important concern, among researchers from different fields, for a long time. In particular, assessing whether the clusters are statistically significant, is a question that has been asked by a number of experimenters. Recently, this question arose again in a study in maize genetics, where determining the significance of clusters is crucial as a primary step in the identification of a genomewide collection of mutants that may affect the kernel composition. Although several efforts have been made in this direction, not much has been done with the aim of developing an actual hypothesis test in order to assess the significance of clusters. In this paper, we propose a new methodology that allows the examination of the hypothesis test H 0: κ=1 vs. H 1:κ=k, whereκdenotes the number of clusters present in a certain population. Our procedure, based on Bayesian tools, permits us to obtain closed form expressions for the posterior probabilities corresponding to the null hypothesis. From here, we calibrate our results by estimating the frequentist null distribution of the posterior probabilities in order to obtain the pvalues associated with the observed posterior probabilities. In most cases, actual evaluation of the posterior probabilities is computationally intensive and several algorithms have been discussed in the literature. Here, we propose a simple estimation procedure, based on MCMC techniques, that permits an efficient and easily implementable evaluation of the test. Finally, we present simulation studies that support our conclusions, and we apply our method to the analysis of NIR spectroscopy data coming from the genetic study that motivated this work. MSC:?????????
Random permutations and partition models
, 2010
"... Set partitions For n ≥ 1, a partition B of the finite set [n] = {1,..., n} is • a collection B = {b1,...} of disjoint nonempty subsets, called blocks, whose union is [n]; • an equivalence relation or Boolean function B: [n] × [n] → {0, 1} that is reflexive, symmetric and transitive; • a symmetri ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Set partitions For n ≥ 1, a partition B of the finite set [n] = {1,..., n} is • a collection B = {b1,...} of disjoint nonempty subsets, called blocks, whose union is [n]; • an equivalence relation or Boolean function B: [n] × [n] → {0, 1} that is reflexive, symmetric and transitive; • a symmetric Boolean matrix such that Bij = 1 if i, j belong to the same block. These equivalent representations are not distinguished in the notation, so B is a set of subsets, a matrix, a Boolean function, or a subset of [n] × [n], as the context demands. In practice, a partition is sometimes written in an abbreviated form, such as B = 213 for a partition of [3]. In this notation, the five partitions of [3] are 123, 123, 132, 231, 123. The blocks are unordered, so 213 is the same partition as 132 and 231. A partition B is a subpartition of B ∗ if each block of B is a subset of some block of B ∗ or, equivalently, if Bij = 1 implies B ∗ ij = 1. This relationship is a partial order denoted by B ≤ B ∗, which can be interpreted as B ⊂ B ∗ if each partition is regarded as a subset of [n] 2. The partition lattice En is the set of partitions of [n] with this partial order. To each pair of partitions B, B ′ there corresponds a greatest lower bound B ∧ B ′ , which is the set intersection or Hadamard componentwise matrix product. The least upper bound B ∨ B ′ is the least element that is greater than both, the transitive completion of B ∪ B ′. The least element of En is the partition 0n with n singleton blocks, and the greatest element is the singleblock partition denoted by 1n. A permutation σ: [n] → [n] induces an action B ↦ → B σ by composition such that B σ (i, j) = B(σ(i), σ(j)). In matrix notation, B σ = σBσ −1, so the action by conjugation permutes both the rows and columns of B in the same way. The block sizes are preserved and are maximally invariant under conjugation. In this way, the 15 partitions of [4] may be grouped into five orbits or equivalence classes as follows:
1 On the Performance of High Dimensional Data Clustering and Classification Algorithms
"... There is often a need to perform machine learning tasks on voluminous amounts of data. These tasks have application in fields such as pattern recognition, data mining, bioinformatics, and recommendation systems. Here we evaluate the performance of 4 clustering algorithms and 2 classification algorit ..."
Abstract
 Add to MetaCart
(Show Context)
There is often a need to perform machine learning tasks on voluminous amounts of data. These tasks have application in fields such as pattern recognition, data mining, bioinformatics, and recommendation systems. Here we evaluate the performance of 4 clustering algorithms and 2 classification algorithms supported by Mahout within two different cloud runtimes, Hadoop and Granules. Our benchmarks use the same Mahout backend code, ensuring a fair comparison. The differences between these implementations stem from how the Hadoop and Granules runtimes (1) support and manage the lifecycle of individual computations, and (2) how they orchestrate exchange of data between different stages of the computational pipeline during successive iterations of the clustering algorithm. We include an analysis of our results for each of these algorithms in a distributed setting, as well as a discussion on measures for failure recovery.
Saskatoon By
"... In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree ..."
Abstract
 Add to MetaCart
(Show Context)
In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree