Results 1  10
of
123
Bayesian density regression
 JOURNAL OF THE ROYAL STATISTICAL SOCIETY B
, 2007
"... This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a nonparametric mixture of parametric densities, with the mixture distribution changing acc ..."
Abstract

Cited by 70 (27 self)
 Add to MetaCart
This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a nonparametric mixture of parametric densities, with the mixture distribution changing according to location in the predictor space. A new class of priors for dependent random measures is proposed for the collection of random mixing measures at each location. The conditional prior for the random measure at a given location is expressed as a mixture of a Dirichlet process (DP) distributed innovation measure and neighboring random measures. This specification results in a coherent prior for the joint measure, with the marginal random measure at each location being a finite mixture of DP basis measures. Integrating out the infinitedimensional collection of mixing measures, we obtain a simple expression for the conditional distribution of the subjectspecific random variables, which generalizes the Pólya urn scheme. Properties are considered and a simple Gibbs sampling algorithm is developed for posterior computation. The methods are illustrated using simulated data examples and epidemiologic studies.
THE NESTED DIRICHLET PROCESS
"... In multicenter studies, subjects in different centers may have different outcome distributions. This article is motivated by the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered. Starting with a stickbreakin ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
In multicenter studies, subjects in different centers may have different outcome distributions. This article is motivated by the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered. Starting with a stickbreaking representation of the Dirichlet process (DP), we replace the random atoms with random probability measures drawn from a DP. This results in a nested Dirichlet process (nDP) prior, which can be placed on the collection of distributions for the different centers, with centers drawn from the same DP component automatically clustered together. Theoretical properties are discussed, and an efficient MCMC algorithm is developed for computation. The methods are illustrated using a simulation study and an application to quality of care in US hospitals.
Distance dependent Chinese restaurant processes
"... We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for nonexchangeability. This class can be used to model dependencies between data in infinite clustering models, including dependencies across time or space. We examine t ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for nonexchangeability. This class can be used to model dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with timedependent models and three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a fastermixing Gibbs sampling algorithm than the one based on the original formulation. 1.
Dynamic NonParametric Mixture Models and The Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering
"... Clustering is an important data mining task for exploration and visualization of different data types like news stories, scientific publications, weblogs, etc. Due to the evolving nature of these data, evolutionary clustering, also known as dynamic clustering, has recently emerged to cope with the c ..."
Abstract

Cited by 51 (7 self)
 Add to MetaCart
(Show Context)
Clustering is an important data mining task for exploration and visualization of different data types like news stories, scientific publications, weblogs, etc. Due to the evolving nature of these data, evolutionary clustering, also known as dynamic clustering, has recently emerged to cope with the challenges of mining temporally smooth clusters over time. A good evolutionary clustering algorithm should be able to fit the data well at each time epoch, and at the same time results in a smooth cluster evolution that provides the data analyst with a coherent and easily interpretable model. In this paper we introduce the temporal Dirichlet process mixture model (TDPM) as a framework for evolutionary clustering. TDPM is a generalization of the DPM framework for clustering that automatically grows the number of clusters with the data. In our framework, the data is divided into epochs; all data points inside the same epoch are assumed to be fully exchangeable, whereas the temporal order is maintained across epochs. Moreover, The number of clusters in each epoch is unbounded: the clusters can retain, die out or emerge over time, and the actual parameterization of each cluster can also evolve over time in a Markovian fashion. We give a detailed and intuitive construction of this framework using the recurrent Chinese restaurant process (RCRP) metaphor, as well as a Gibbs sampling algorithm to carry out posterior inference in order to determine the optimal cluster evolution. We demonstrate our model over simulated data by using it to build an infinite dynamic mixture of Gaussian factors, and over real dataset by using it to build a simple nonparametric dynamic clusteringtopic model and apply it to analyze the NIPS12 document collection.
Generalized spatial Dirichlet process models
, 2007
"... Many models for the study of pointreferenced data explicitly introduce spatial random effects to capture residual spatial association. These spatial effects are customarily modelled as a zeromean stationary Gaussian process. The spatial Dirichlet process introduced by Gelfand et al. (2005) produces ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
Many models for the study of pointreferenced data explicitly introduce spatial random effects to capture residual spatial association. These spatial effects are customarily modelled as a zeromean stationary Gaussian process. The spatial Dirichlet process introduced by Gelfand et al. (2005) produces a random spatial process which is neither Gaussian nor stationary. Rather, it varies about a process that is assumed to be stationary and Gaussian. The spatial Dirichlet process arises as a probabilityweighted collection of random surfaces. This can be limiting for modelling and inferential purposes since it insists that a process realization must be one of these surfaces. We introduce a random distribution for the spatial effects that allows different surface selection at different sites. Moreover, we can specify the model so that the marginal distribution of the effect at each site still comes from a Dirichlet process. The development is offered constructively, providing a multivariate extension of the stickbreaking representation of the weights. We then introduce mixing using this generalized spatial Dirichlet process. We illustrate with a simulated dataset of independent replications and note that we can embed the generalized process within a dynamic model specification to eliminate the independence assumption.
Smoothly mixing regressions
 Journal of Econometrics
, 2006
"... This paper extends the conventional Bayesian mixture of normals model by permitting state probabilities to depend on observed covariates. The dependence is captured by a simple multinomial probit model. A conventional and rapidly mixing MCMC algorithm provides access to the posterior distribution at ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
This paper extends the conventional Bayesian mixture of normals model by permitting state probabilities to depend on observed covariates. The dependence is captured by a simple multinomial probit model. A conventional and rapidly mixing MCMC algorithm provides access to the posterior distribution at modest computational cost. This model is competitive with existing econometric models, as documented in the paper’s illustrations. The first illustration studies quantiles of the distribution of earnings of men conditional on age and education, and shows that smoothly mixing regressions are an attractive alternative to nonBaeysian quantile regression. The second illustration models serial dependence in the S&P 500 return, and shows that the model compares favorably with ARCH models using out of sample likelihood criteria.
A tutorial on Bayesian nonparametric models.
 Journal of Mathematical Psychology,
, 2012
"... Abstract A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number of clusters in mixture models or the number of factors in factor analysis. In this tutorial ..."
Abstract

Cited by 42 (9 self)
 Add to MetaCart
(Show Context)
Abstract A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number of clusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that sidesteps this issue by allowing the data to determine the complexity of the model. This tutorial is a highlevel introduction to Bayesian nonparametric methods and contains several examples of their application.
The Matrix StickBreaking Process for Flexible MultiTask Learning
"... In multitask learning our goal is to design regression or classification models for each of the tasks and appropriately share information between tasks. A Dirichlet process (DP) prior can be used to encourage task clustering. However, the DP prior does not allow local clustering of tasks with respe ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
(Show Context)
In multitask learning our goal is to design regression or classification models for each of the tasks and appropriately share information between tasks. A Dirichlet process (DP) prior can be used to encourage task clustering. However, the DP prior does not allow local clustering of tasks with respect to a subset of the feature vector without making independence assumptions. Motivated by this problem, we develop a new multitasklearning prior, termed the matrix stickbreaking process (MSBP), which encourages crosstask sharing of data. However, the MSBP allows separate clustering and borrowing of information for the different feature components. This is important when tasks are more closely related for certain features than for others. Bayesian inference proceeds by a Gibbs sampling algorithm and the approach is illustrated using a simulated example and a multinational application. 1.
Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream
"... Topic models have proven to be a useful tool for discovering latent structures in document collections. However, most document collections often come as temporal streams and thus several aspects of the latent structure such as the number of topics, the topics ’ distribution and popularity are timee ..."
Abstract

Cited by 38 (7 self)
 Add to MetaCart
Topic models have proven to be a useful tool for discovering latent structures in document collections. However, most document collections often come as temporal streams and thus several aspects of the latent structure such as the number of topics, the topics ’ distribution and popularity are timeevolving. Several models exist that model the evolution of some but not all of the above aspects. In this paper we introduce infinite dynamic topic models, iDTM, that can accommodate the evolution of all the aforementioned aspects. Our model assumes that documents are organized into epochs, where the documents within each epoch are exchangeable but the order between the documents is maintained across epochs. iDTM allows for unbounded number of topics: topics can die or be born at any epoch, and the representation of each topic can evolve according to a Markovian dynamics. We use iDTM to analyze the birth and evolution of topics in the NIPS community and evaluated the efficacy of our model on both simulated and real datasets with favorable outcome. 1
Nonparametric bayes conditional distribution modeling with variable selection
 Journal of the American Statistical Association
, 2009
"... This article considers methodology for flexibly characterizing the relationship between a response and multiple predictors. Goals are (1) to estimate the conditional response distribution addressing the distributional changes across the predictor space, and (2) to identify important predictors for t ..."
Abstract

Cited by 33 (18 self)
 Add to MetaCart
This article considers methodology for flexibly characterizing the relationship between a response and multiple predictors. Goals are (1) to estimate the conditional response distribution addressing the distributional changes across the predictor space, and (2) to identify important predictors for the response distribution change both with local regions and globally. We first introduce the probit stickbreaking process (PSBP) as a prior for an uncountable collection of predictordependent random probability measures and propose a PSBP mixture (PSBPM) of normal regressions for modeling the conditional distributions. A global variable selection structure is incorporated to discard unimportant predictors, while allowing estimation of posterior inclusion probabilities. Local variable selection is conducted relying on the conditional distribution estimates at different predictor points. An efficient stochastic search sampling algorithm is proposed for posterior computation. The methods are illustrated through simulation and applied to an epidemiologic study.