• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Slice sampling (2000)

by R Neal
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 305
Next 10 →

An Introduction to MCMC for Machine Learning

by Christophe Andrieu, et al. , 2003
"... ..."
Abstract - Cited by 382 (5 self) - Add to MetaCart
Abstract not found

Rethinking LDA: Why Priors Matter

by Hanna M. Wallach, David Mimno, Andrew Mccallum
"... Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters ” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an ..."
Abstract - Cited by 110 (3 self) - Add to MetaCart
Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters ” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document–topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic–word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling. 1
(Show Context)

Citation Context

... to vary this choice of prior for either Θ or Φ is to infer the relevant concentration parameter from data, either by computing a MAP estimate [1] or by using an MCMC algorithm such as slice sampling =-=[13]-=-. A broad Gamma distribution is an appropriate choice of prior for both α and β. n n 2u u D θd α D θd α m u T β φ t (a) N zn wn u β ′ n T β φ t (b) N zn wn γ1 =t γ2 =t ′ γ3 =t t|d t ′ |d t|d t|d (c) ...

Unsupervised Modeling of Twitter Conversations

by Alan Ritter, Colin Cherry, Bill Dolan , 2010
"... We propose the first unsupervised approach to the problem of modeling dialogue acts in an open domain. Trained on a corpus of noisy Twitter conversations, our method discovers dialogue acts by clustering raw utterances. Because it accounts for the sequential behaviour of these acts, the learned mode ..."
Abstract - Cited by 90 (4 self) - Add to MetaCart
We propose the first unsupervised approach to the problem of modeling dialogue acts in an open domain. Trained on a corpus of noisy Twitter conversations, our method discovers dialogue acts by clustering raw utterances. Because it accounts for the sequential behaviour of these acts, the learned model can provide insight into the shape of communication in a new medium. We address the challenge of evaluating the emergent model with a qualitative visualization and an intrinsic conversation ordering task. This work is inspired by a corpus of 1.3 million Twitter conversations, which will be made publicly available. This huge amount of data, available only because Twitter blurs the line between chatting and publishing, highlights the need to be able to adapt quickly to a new medium. 1
(Show Context)

Citation Context

...ation parameters as additional hidden variables and sample each in turn, conditioned on the current assignment to all other variables. Because these variables are continuous, we apply slice sampling (=-=Neal, 2003-=-). Slice sampling is a general technique for drawing samples from a distribution by sampling uniformly from the area under its density function. 3.3 Estimating Likelihood on Held-Out Data In Section 4...

Stick-breaking construction for the Indian buffet process

by Yee Whye Teh - In Proceedings of the International Conference on Artificial Intelligence and Statistics
"... The Indian buffet process (IBP) is a Bayesian nonparametric distribution whereby objects are modelled using an unbounded number of latent features. In this paper we derive a stick-breaking representation for the IBP. Based on this new representation, we develop slice samplers for the IBP that are ef ..."
Abstract - Cited by 79 (13 self) - Add to MetaCart
The Indian buffet process (IBP) is a Bayesian nonparametric distribution whereby objects are modelled using an unbounded number of latent features. In this paper we derive a stick-breaking representation for the IBP. Based on this new representation, we develop slice samplers for the IBP that are efficient, easy to implement and are more generally applicable than the currently available Gibbs sampler. This representation, along with the work of Thibaux and Jordan [17], also illuminates interesting theoretical connections between the IBP, Chinese restaurant processes, Beta processes and Dirichlet processes. 1

The no-u-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Arxiv preprint arXiv:1111.4246

by Matthew D. Hoffman, Andrew Gelman , 2011
"... Hierarchical Bayesian models are a mainstay of the machine learning and statistics communities. Exact posterior inference in such models is rarely tractable, so researchers and practitioners must usually resort to approximate inference methods. Perhaps the most popular class of approximate posterior ..."
Abstract - Cited by 70 (2 self) - Add to MetaCart
Hierarchical Bayesian models are a mainstay of the machine learning and statistics communities. Exact posterior inference in such models is rarely tractable, so researchers and practitioners must usually resort to approximate inference methods. Perhaps the most popular class of approximate posterior inference algorithms, Markov Chain Monte Carlo (MCMC) methods offer schemes for
(Show Context)

Citation Context

...rsibility, and is therefore not guaranteed to converge to the correct distribution. NUTS overcomes this issue by means of a recursive slice sampling algorithm reminiscent of the doubling procedure of =-=[12]-=-. 2 Algorithm 1 No-U-Turn Sampler Resample r ∼ N (0, I). (I denotes the identity matrix.) Resample u ∼ Uniform((0, exp{L(θt)− 1 2 rT r}]) Initialize θ− = θt, θ+ = θt, r− = r, r+ = r, j = 0, θt+1 = θt,...

Elliptical slice sampling

by Iain Murray, Ryan Prescott Adams, David J. C. MacKay - JMLR: W&CP
"... Many probabilistic models introduce strong dependencies between variables using a latent multivariate Gaussian distribution or a Gaussian process. We present a new Markov chain Monte Carlo algorithm for performing inference in models with multivariate Gaussian priors. Its key properties are: 1) it h ..."
Abstract - Cited by 60 (8 self) - Add to MetaCart
Many probabilistic models introduce strong dependencies between variables using a latent multivariate Gaussian distribution or a Gaussian process. We present a new Markov chain Monte Carlo algorithm for performing inference in models with multivariate Gaussian priors. Its key properties are: 1) it has simple, generic code applicable to many models, 2) it has no free parameters, 3) it works well for a variety of Gaussian process based models. These properties make our method ideal for use while model building, removing the need to spend time deriving and tuning updates for more complex algorithms.

Slice sampling covariance hyperparameters of latent Gaussian models

by Iain Murray, Ryan Prescott Adams - IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 23 , 2010
"... The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations fo ..."
Abstract - Cited by 56 (10 self) - Add to MetaCart
The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations for the data when making predictions. This integration is often performed using Markov chain Monte Carlo (MCMC) sampling. However, with non-Gaussian observations standard hyperparameter sampling approaches require careful tuning and may converge slowly. In this paper we present a slice sampling approach that requires little tuning while mixing well in both strong- and weak-data regimes.
(Show Context)

Citation Context

...lasses: deterministic approximations and Monte Carlo simulation. This work presents a method to make the sampling approach easier to apply. In recent work Murray et al. [1] developed a slice sampling =-=[2]-=- variant, elliptical slice sampling, for updating strongly coupled a-priori Gaussian variates given non-Gaussian observations. Previously, Agarwal and Gelfand [3] demonstrated the utility of slice sam...

Beam Sampling for the Infinite Hidden Markov Model

by Jurgen Van Gael, Yunus Saatci, Yee Whye Teh, Zoubin Ghahramani
"... The infinite hidden Markov model is a nonparametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling. Beam sampling combines slice sampling, which limits the number of states considered at each ..."
Abstract - Cited by 52 (8 self) - Add to MetaCart
The infinite hidden Markov model is a nonparametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling. Beam sampling combines slice sampling, which limits the number of states considered at each time step to a finite number, with dynamic programming, which samples whole state trajectories efficiently. Our algorithm typically outperforms the Gibbs sampler and is more robust. We present applications of iHMM inference using the beam sampler on changepoint detection and text prediction problems. 1.
(Show Context)

Citation Context

...w sampler for the iHMM called beam sampling. Beam sampling combines two ideas—slice sampling and dynamic programming—to sample whole state trajectories efficiently. Our application of slice sampling (=-=Neal, 2003-=-) is inspired by (Walker, 2007), who used it to limit the number of clusters considered when sampling assignment variables in DP mixtures to a finite number. We apply slice sampling to limit to a fini...

To transfer or not to transfer

by Michael T. Rosenstein, Zvika Marx, Leslie Pack Kaelbling, Thomas G. Dietterich - In NIPS’05 Workshop, Inductive Transfer: 10 Years Later , 2005
"... With transfer learning, one set of tasks is used to bias learning and improve performance on another task. However, transfer learning may actually hinder performance if the tasks are too dissimilar. As described in this paper, one challenge for transfer learning research is to develop approaches tha ..."
Abstract - Cited by 52 (0 self) - Add to MetaCart
With transfer learning, one set of tasks is used to bias learning and improve performance on another task. However, transfer learning may actually hinder performance if the tasks are too dissimilar. As described in this paper, one challenge for transfer learning research is to develop approaches that detect and avoid negative transfer using very little data from the target task. 1
(Show Context)

Citation Context

...other parameters are very different (by increasing the variance of the hyperprior). To compute the posterior distributions, we developed an extension of the “slice sampling” method introduced by Neal =-=[6]-=-. 3 Experiments We tested the hierarchical naive Bayes algorithm on data from a meeting acceptance task. For this task, the goal is to learn to predict whether a person will accept an invitation to a ...

Nonlinear Models Using Dirichlet Process Mixtures

by Babak Shahbaba, Radford Neal, Zoubin Ghahramani
"... We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relations ..."
Abstract - Cited by 43 (0 self) - Add to MetaCart
We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefficients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on two classification problems: identifying the folding class of protein sequences and detecting Parkinson’s disease. Our model can sometimes improve predictive accuracy. Moreover, by grouping observations into sub-populations (i.e., mixture components), our model can sometimes provide insight into hidden structure in the data.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University