Results 1  10
of
70
The nested chinese restaurant process and bayesian inference of topic hierarchies
, 2007
"... We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitelydeep, infinitelybranching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Spe ..."
Abstract

Cited by 128 (15 self)
 Add to MetaCart
(Show Context)
We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitelydeep, infinitelybranching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.
An HDPHMM for Systems with State Persistence
"... The hierarchical Dirichlet process hidden Markov model (HDPHMM) is a flexible, nonparametric model which allows state spaces of unknown size to be learned from data. We demonstrate some limitations of the original HDPHMM formulation (Teh et al., 2006), and propose a sticky extension which allows m ..."
Abstract

Cited by 73 (8 self)
 Add to MetaCart
(Show Context)
The hierarchical Dirichlet process hidden Markov model (HDPHMM) is a flexible, nonparametric model which allows state spaces of unknown size to be learned from data. We demonstrate some limitations of the original HDPHMM formulation (Teh et al., 2006), and propose a sticky extension which allows more robust learning of smoothly varying dynamics. Using DP mixtures, this formulation also allows learning of more complex, multimodal emission distributions. We further develop a sampling algorithm that employs a truncated approximation of the DP to jointly resample the full state sequence, greatly improving mixing rates. Via extensive experiments with synthetic data and the NIST speaker diarization database, we demonstrate the advantages of our sticky extension, and the utility of the HDPHMM in realworld applications. 1.
NONPARAMETRIC FUNCTIONAL DATA ANALYSIS THROUGH BAYESIAN DENSITY ESTIMATION
, 2007
"... In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. Some examples are conductivitytemperaturedepth (CTD) data in oceanography, doseresponse models in epidemiology and timecourse microarray ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. Some examples are conductivitytemperaturedepth (CTD) data in oceanography, doseresponse models in epidemiology and timecourse microarray experiments in biology and medicine. In this paper we propose a hierarchical model that allows us to simultaneously estimate multiple curves nonparametrically by using dependent Dirichlet Process mixtures of Gaussians to characterize the joint distribution of predictors and outcomes. Function estimates are then induced through the conditional distribution of the outcome given the predictors. The resulting approach allows for flexible estimation and clustering, while borrowing information across curves. We also show that the function estimates we obtain are consistent on the space of integrable functions. As an illustration, we consider an application to the analysis of CTD data in the north Atlantic.
Spatial Normalized Gamma Processes
"... Dependent Dirichlet processes (DPs) are dependent sets of random measures, each being marginally DP distributed. They are used in Bayesian nonparametric models when the usual exchangeability assumption does not hold. We propose a simple and general framework to construct dependent DPs by marginalizi ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
Dependent Dirichlet processes (DPs) are dependent sets of random measures, each being marginally DP distributed. They are used in Bayesian nonparametric models when the usual exchangeability assumption does not hold. We propose a simple and general framework to construct dependent DPs by marginalizing and normalizing a single gamma process over an extended space. The result is a set of DPs, each associated with a point in a space such that neighbouring DPs are more dependent. We describe Markov chain Monte Carlo inference involving Gibbs sampling and three different MetropolisHastings proposals to speed up convergence. We report an empirical study of convergence on a synthetic dataset and demonstrate an application of the model to topic modeling through time. 1
Hierarchical Models, Nested Models and Completely Random Measures
, 2010
"... Statistics has both optimistic and pessimistic faces, with the Bayesian perspective often associated with the former and the frequentist perspective with the latter, but with foundational thinkers such as Jim Berger reminding us that statistics is fundamentally a Januslike creature with two faces. ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
Statistics has both optimistic and pessimistic faces, with the Bayesian perspective often associated with the former and the frequentist perspective with the latter, but with foundational thinkers such as Jim Berger reminding us that statistics is fundamentally a Januslike creature with two faces. In creating one field out of two perspectives, one of the unifying
Learning with HierarchicalDeep Models
"... Abstract—We introduce HD (or “HierarchicalDeep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the t ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We introduce HD (or “HierarchicalDeep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the toplevel features in a deep Boltzmann machine (DBM). This compound HDPDBM model learns to learn novel concepts from very few training example by learning lowlevel generic features, highlevel features that capture correlations among lowlevel features, and a category hierarchy for sharing priors over the highlevel features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDPDBM model and show that it is able to learn new concepts from very few examples on CIFAR100 object recognition, handwritten character recognition, and human motion capture datasets. Index Terms—Deep networks, deep Boltzmann machines, hierarchical Bayesian models, oneshot learning Ç 1
A Bayesian discovery procedure
 J. R. Stat. Soc. Ser. B (Stat Methodol
, 2009
"... Summary. The optimal discovery procedure (ODP) maximizes the expected number of true positives for every fixed expected number of false positives. We show that the ODP can be interpreted as an approximate Bayes rule under a semiparametric model. Improving the approximation leads us to a Bayesian d ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Summary. The optimal discovery procedure (ODP) maximizes the expected number of true positives for every fixed expected number of false positives. We show that the ODP can be interpreted as an approximate Bayes rule under a semiparametric model. Improving the approximation leads us to a Bayesian discovery procedure (BDP), which exploits the multiple shrinkage in clusters implied by the assumed nonparametric model. We compare the BDP and the ODP estimates in a simple simulation study and in an assessment of differential gene expression between two tumor samples. We extend the setting of the ODP by discussing modifications of the loss function that lead to different single thresholding statistics. Finally, we provide an application of the previous arguments to dependent (spatial) data.
CONVERGENCE OF LATENT MIXING MEASURES IN FINITE AND INFINITE MIXTURE MODELS
, 2013
"... This paper studies convergence behavior of latent mixing measures that arise in finite and infinite mixture models, using transportation distances (i.e., Wasserstein metrics). The relationship between Wasserstein distances on the space of mixing measures and fdivergence functionals such as Hellinge ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
This paper studies convergence behavior of latent mixing measures that arise in finite and infinite mixture models, using transportation distances (i.e., Wasserstein metrics). The relationship between Wasserstein distances on the space of mixing measures and fdivergence functionals such as Hellinger and Kullback–Leibler distances on the space of mixture distributions is investigated in detail using various identifiability conditions. Convergence in Wasserstein metrics for discrete measures implies convergence of individual atoms that provide support for the measures, thereby providing a natural interpretation of convergence of clusters in clustering applications where mixture models are typically employed. Convergence rates of posterior distributions for latent mixing measures are established, for both finite mixtures of multivariate distributions and infinite mixtures based on the Dirichlet process.
Learning to Learn with Compound HD Models
"... We introduce HD (or “HierarchicalDeep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models. Specifically we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the toplevel featur ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
We introduce HD (or “HierarchicalDeep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models. Specifically we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the toplevel features in a Deep Boltzmann Machine (DBM). This compound HDPDBM model learns to learn novel concepts from very few training examples, by learning lowlevel generic features, highlevel features that capture correlations among lowlevel features, and a category hierarchy for sharing priors over the highlevel features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDPDBM model and show that it is able to learn new concepts from very few examples on CIFAR100 object recognition, handwritten character recognition, and human motion capture datasets. 1
A HIERARCHICAL DIRICHLET PROCESS MIXTURE MODEL FOR HAPLOTYPE RECONSTRUCTION FROM MULTIPOPULATION DATA
, 2009
"... The perennial problem of “how many clusters?” remains an issue of substantial interest in data mining and machine learning communities, and becomes particularly salient in large data sets such as populational genomic data where the number of clusters needs to be relatively large and openended. This ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
The perennial problem of “how many clusters?” remains an issue of substantial interest in data mining and machine learning communities, and becomes particularly salient in large data sets such as populational genomic data where the number of clusters needs to be relatively large and openended. This problem gets further complicated in a coclustering scenario in which one needs to solve multiple clustering problems simultaneously because of the presence of common centroids (e.g., ancestors) shared by clusters (e.g., possible descents from a certain ancestor) from different multiplecluster samples (e.g., different human subpopulations). In this paper we present a hierarchical nonparametric Bayesian model to address this problem in the context of multipopulation haplotype inference. Uncovering the haplotypes of single nucleotide polymorphisms is essential for many biological and medical applications. While it is uncommon for the genotype data to be pooled from multiple ethnically distinct populations, few existing programs have explicitly leveraged the individual ethnic information for haplotype inference. In this paper we present a new haplotype inference program, Haploi, which makes use of such information and is readily applicable to genotype sequences with thousands of SNPs from heterogeneous populations, with competent and sometimes superior speed and accuracy comparing to the stateoftheart programs. Underlying Haploi is a new haplotype distribution model based on a nonparametric Bayesian formalism known as the hierarchical Dirichlet process, which represents a tractable surrogate to the coalescent process. The proposed model is exchangeable, unbounded, and capable of coupling demographic information of different populations. It offers a wellfounded statistical framework for posterior inference of individual haplotypes, the size and configuration of haplotype ancestor pools, and other parameters of interest given genotype data.