Results 1 - 10
of
305
Rethinking LDA: Why Priors Matter
"... Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters ” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an ..."
Abstract
-
Cited by 110 (3 self)
- Add to MetaCart
(Show Context)
Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters ” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document–topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic–word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling. 1
Unsupervised Modeling of Twitter Conversations
, 2010
"... We propose the first unsupervised approach to the problem of modeling dialogue acts in an open domain. Trained on a corpus of noisy Twitter conversations, our method discovers dialogue acts by clustering raw utterances. Because it accounts for the sequential behaviour of these acts, the learned mode ..."
Abstract
-
Cited by 90 (4 self)
- Add to MetaCart
(Show Context)
We propose the first unsupervised approach to the problem of modeling dialogue acts in an open domain. Trained on a corpus of noisy Twitter conversations, our method discovers dialogue acts by clustering raw utterances. Because it accounts for the sequential behaviour of these acts, the learned model can provide insight into the shape of communication in a new medium. We address the challenge of evaluating the emergent model with a qualitative visualization and an intrinsic conversation ordering task. This work is inspired by a corpus of 1.3 million Twitter conversations, which will be made publicly available. This huge amount of data, available only because Twitter blurs the line between chatting and publishing, highlights the need to be able to adapt quickly to a new medium. 1
Stick-breaking construction for the Indian buffet process
- In Proceedings of the International Conference on Artificial Intelligence and Statistics
"... The Indian buffet process (IBP) is a Bayesian nonparametric distribution whereby objects are modelled using an unbounded number of latent features. In this paper we derive a stick-breaking representation for the IBP. Based on this new representation, we develop slice samplers for the IBP that are ef ..."
Abstract
-
Cited by 79 (13 self)
- Add to MetaCart
The Indian buffet process (IBP) is a Bayesian nonparametric distribution whereby objects are modelled using an unbounded number of latent features. In this paper we derive a stick-breaking representation for the IBP. Based on this new representation, we develop slice samplers for the IBP that are efficient, easy to implement and are more generally applicable than the currently available Gibbs sampler. This representation, along with the work of Thibaux and Jordan [17], also illuminates interesting theoretical connections between the IBP, Chinese restaurant processes, Beta processes and Dirichlet processes. 1
The no-u-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Arxiv preprint arXiv:1111.4246
, 2011
"... Hierarchical Bayesian models are a mainstay of the machine learning and statistics communities. Exact posterior inference in such models is rarely tractable, so researchers and practitioners must usually resort to approximate inference methods. Perhaps the most popular class of approximate posterior ..."
Abstract
-
Cited by 70 (2 self)
- Add to MetaCart
(Show Context)
Hierarchical Bayesian models are a mainstay of the machine learning and statistics communities. Exact posterior inference in such models is rarely tractable, so researchers and practitioners must usually resort to approximate inference methods. Perhaps the most popular class of approximate posterior inference algorithms, Markov Chain Monte Carlo (MCMC) methods offer schemes for
Elliptical slice sampling
- JMLR: W&CP
"... Many probabilistic models introduce strong dependencies between variables using a latent multivariate Gaussian distribution or a Gaussian process. We present a new Markov chain Monte Carlo algorithm for performing inference in models with multivariate Gaussian priors. Its key properties are: 1) it h ..."
Abstract
-
Cited by 60 (8 self)
- Add to MetaCart
Many probabilistic models introduce strong dependencies between variables using a latent multivariate Gaussian distribution or a Gaussian process. We present a new Markov chain Monte Carlo algorithm for performing inference in models with multivariate Gaussian priors. Its key properties are: 1) it has simple, generic code applicable to many models, 2) it has no free parameters, 3) it works well for a variety of Gaussian process based models. These properties make our method ideal for use while model building, removing the need to spend time deriving and tuning updates for more complex algorithms.
Slice sampling covariance hyperparameters of latent Gaussian models
- IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 23
, 2010
"... The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations fo ..."
Abstract
-
Cited by 56 (10 self)
- Add to MetaCart
(Show Context)
The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations for the data when making predictions. This integration is often performed using Markov chain Monte Carlo (MCMC) sampling. However, with non-Gaussian observations standard hyperparameter sampling approaches require careful tuning and may converge slowly. In this paper we present a slice sampling approach that requires little tuning while mixing well in both strong- and weak-data regimes.
Beam Sampling for the Infinite Hidden Markov Model
"... The infinite hidden Markov model is a nonparametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling. Beam sampling combines slice sampling, which limits the number of states considered at each ..."
Abstract
-
Cited by 52 (8 self)
- Add to MetaCart
(Show Context)
The infinite hidden Markov model is a nonparametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling. Beam sampling combines slice sampling, which limits the number of states considered at each time step to a finite number, with dynamic programming, which samples whole state trajectories efficiently. Our algorithm typically outperforms the Gibbs sampler and is more robust. We present applications of iHMM inference using the beam sampler on changepoint detection and text prediction problems. 1.
To transfer or not to transfer
- In NIPS’05 Workshop, Inductive Transfer: 10 Years Later
, 2005
"... With transfer learning, one set of tasks is used to bias learning and improve performance on another task. However, transfer learning may actually hinder performance if the tasks are too dissimilar. As described in this paper, one challenge for transfer learning research is to develop approaches tha ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
(Show Context)
With transfer learning, one set of tasks is used to bias learning and improve performance on another task. However, transfer learning may actually hinder performance if the tasks are too dissimilar. As described in this paper, one challenge for transfer learning research is to develop approaches that detect and avoid negative transfer using very little data from the target task. 1
Nonlinear Models Using Dirichlet Process Mixtures
"... We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relations ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefficients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on two classification problems: identifying the folding class of protein sequences and detecting Parkinson’s disease. Our model can sometimes improve predictive accuracy. Moreover, by grouping observations into sub-populations (i.e., mixture components), our model can sometimes provide insight into hidden structure in the data.