Results 11 - 20
of
413
A Split-Merge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model
- Journal of Computational and Graphical Statistics
, 2000
"... . We propose a split-merge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an ..."
Abstract
-
Cited by 64 (0 self)
- Add to MetaCart
. We propose a split-merge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methods for Bayesian mixture models, such as Gibbs sampling, can become trapped in isolated modes corresponding to an inappropriate clustering of data points. This article describes a Metropolis-Hastings procedure that can escape such local modes by splitting or merging mixture components. Our Metropolis-Hastings algorithm employs a new technique in which an appropriate proposal for splitting or merging components is obtained by using a restricted Gibbs sampling scan. We demonstrate empirically that our method outperforms the Gibbs sampler in situations where two or more components are similar in structure. Key words: Dirichlet process mixture model, Markov chain Monte Carlo, Metropolis-Hastings algorithm, Gibbs sampler, split-merge updates 1 Introduction Mixture models are often applied to density estim...
Sequential Importance Sampling for Nonparametric Bayes Models: The Next Generation
- Journal of Statistics
, 1998
"... this paper, we exploit the similarities between the Gibbs sampler and the SIS, bringing over the improvements for Gibbs sampling algorithms to the SIS setting for nonparametric Bayes problems. These improvements result in an improved sampler and help satisfy questions of Diaconis (1995) pertaining t ..."
Abstract
-
Cited by 61 (5 self)
- Add to MetaCart
this paper, we exploit the similarities between the Gibbs sampler and the SIS, bringing over the improvements for Gibbs sampling algorithms to the SIS setting for nonparametric Bayes problems. These improvements result in an improved sampler and help satisfy questions of Diaconis (1995) pertaining to convergence. Such an effort can see wide applications in many other problems related to dynamic systems where the SIS is useful (Berzuini et al. 1996; Liu and Chen 1996). Section 2 describes the specific model that we consider. For illustration we focus discussion on the beta-binomial model, although the methods are applicable to other conjugate families. In Section 3, we describe the first generation of the SIS and Gibbs sampler in this context, and present the necessary conditional distributions upon which the techniques rely. Section 4 describes the alterations that create the second generation techniques, and provides specific algorithms for the model we consider. Section 5 presents a comparison of the techniques on a large set of data. Section 6 provides theory that ensures the proposed methods work and that is generally applicable to many other problems using importance sampling approaches. The final section presents discussion. 2 The Model
Bayesian Compressive Sensing
, 2007
"... The data of interest are assumed to be represented as N-dimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basis-function coefficients associated with B. Compressive sensing ..."
Abstract
-
Cited by 60 (10 self)
- Add to MetaCart
The data of interest are assumed to be represented as N-dimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basis-function coefficients associated with B. Compressive sensing is a framework whereby one does not measure one of the aforementioned N-dimensional signals directly, but rather a set of related measurements, with the new measurements a linear combination of the original underlying N-dimensional signal. The number of required compressive-sensing measurements is typically much smaller than N, offering the potential to simplify the sensing system. Let f denote the unknown underlying N-dimensional signal, and g a vector of compressive-sensing measurements, then one may approximate f accurately by utilizing knowledge of the (under-determined) linear relationship between f and g, in addition to knowledge of the fact that f is compressible in B. In this paper we employ a Bayesian formalism for estimating the underlying signal f based on compressive-sensing measurements g. The proposed framework has the following properties: (i) in addition to estimating the underlying signal f, “error bars ” are also estimated, these giving a measure of confidence in the inverted signal; (ii) using knowledge of the error bars, a principled means is provided for determining when a sufficient
Multi-task learning for classification with dirichlet process priors
- Journal of Machine Learning Research
, 2007
"... Multi-task learning (MTL) is considered for logistic-regression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asy ..."
Abstract
-
Cited by 57 (6 self)
- Add to MetaCart
Multi-task learning (MTL) is considered for logistic-regression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asymmetric MTL (AMTL) formulation in which the posterior density function from the SMTL model parameters, from previous tasks, is used as a prior for a new task; this approach has the significant advantage of not requiring storage and use of all previous data from prior tasks. The AMTL formulation is solved with a simple Markov Chain Monte Carlo (MCMC) construction. Comparisons are also made to simpler approaches, such as single-task learning, pooling of data across tasks, and simplified approximations to DP. A comprehensive analysis of algorithm performance is addressed through consideration of two data sets that are matched to the MTL problem.
A Latent Dirichlet Model for Unsupervised Entity Resolution
- SIAM INTERNATIONAL CONFERENCE ON DATA MINING
, 2006
"... Entity resolution has received considerable attention in recent years. Given many references to underlying entities, the goal is to predict which references correspond to the same entity. We show how to extend the Latent Dirichlet Allocation model for this task and propose a probabilistic model for ..."
Abstract
-
Cited by 53 (5 self)
- Add to MetaCart
Entity resolution has received considerable attention in recent years. Given many references to underlying entities, the goal is to predict which references correspond to the same entity. We show how to extend the Latent Dirichlet Allocation model for this task and propose a probabilistic model for collective entity resolution for relational domains where references are connected to each other. Our approach differs from other recently proposed entity resolution approaches in that it is a) generative, b) does not make pair-wise decisions and c) captures relations between entities through a hidden group variable. We propose a novel sampling algorithm for collective entity resolution which is unsupervised and also takes entity relations into account. Additionally, we do not assume the domain of entities to be known and show how to infer the number of entities from the data. We demonstrate the utility and practicality of our relational entity resolution approach for author resolution in two real-world bibliographic datasets. In addition, we present preliminary results on characterizing conditions under which relational information is useful.
Modelling heterogeneity with and without the Dirichlet process
, 2001
"... We investigate the relationships between Dirichlet process (DP) based models and allocation models for a variable number of components, based on exchangeable distributions. It is shown that the DP partition distribution is a limiting case of a Dirichlet± multinomial allocation model. Comparisons of ..."
Abstract
-
Cited by 49 (3 self)
- Add to MetaCart
We investigate the relationships between Dirichlet process (DP) based models and allocation models for a variable number of components, based on exchangeable distributions. It is shown that the DP partition distribution is a limiting case of a Dirichlet± multinomial allocation model. Comparisons of posterior performance of DP and allocation models are made in the Bayesian paradigm and illustrated in the context of univariate mixture models. It is shown in particular that the unbalancedness of the allocation distribution, present in the prior DP model, persists a posteriori. Exploiting the model connections, a new MCMC sampler for general DP based models is introduced, which uses split/merge moves in a reversible jump framework. Performance of this new sampler relative to that of some traditional samplers for DP processes is then explored.
The infinite PCFG using hierarchical Dirichlet processes
- In EMNLP ’07
, 2007
"... We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDP-PCFG model allows the complexity of the grammar to grow as more training data is available. In addition to presenting a fully Bayesian model for the PCFG, we also develop an effici ..."
Abstract
-
Cited by 48 (5 self)
- Add to MetaCart
We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDP-PCFG model allows the complexity of the grammar to grow as more training data is available. In addition to presenting a fully Bayesian model for the PCFG, we also develop an efficient variational inference procedure. On synthetic data, we recover the correct grammar without having to specify its complexity in advance. We also show that our techniques can be applied to full-scale parsing applications by demonstrating its effectiveness in learning state-split grammars. 1
More Aspects of Polya Tree Distributions for Statistical Modelling
- Ann. Statist
, 1994
"... : The definition and elementary properties of Polya tree distributions are reviewed. Two theorems are presented showing that Polya trees can be constructed to concentrate arbitrarily closely about any desired pdf, and that Polya tree priors can put positive mass in every relative entropy neighborhoo ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
: The definition and elementary properties of Polya tree distributions are reviewed. Two theorems are presented showing that Polya trees can be constructed to concentrate arbitrarily closely about any desired pdf, and that Polya tree priors can put positive mass in every relative entropy neighborhood of every positive density with finite entropy, thereby satisfying a consistency condition. Such theorems are false for Dirichlet processes. Models are constructed combining partially specified Polya trees with other information like monotonicity or unimodality. It is shown how to compute bounds on posterior expectations over the class of all priors with the given specifications. A numerical example is given. A theorem of Diaconis and Freedman about Dirichlet processes is generalized to Polya trees, allowing Polya trees to be the models for errors in regression problems. Finally, empirical Bayes models using Dirichlet processes are generalized to Polya trees. An example from Berry and Chris...

