Results 1  10
of
13
Spectral Methods for Supervised Topic Models
"... Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on either variational approximation or Monte Carlo sampling. This paper presents a novel spectral dec ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on either variational approximation or Monte Carlo sampling. This paper presents a novel spectral decomposition algorithm to recover the parameters of supervised latent Dirichlet allocation (sLDA) models. The SpectralsLDA algorithm is provably correct and computationally efficient. We prove a sample complexity bound and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on a diverse range of synthetic and realworld datasets verify the theory and demonstrate the practical effectiveness of the algorithm. 1
Online Bayesian passiveaggressive learning.
 In International Conference on Machine Learning (ICML),
, 2014
"... Abstract We present online Bayesian PassiveAggressive (BayesPA) learning, a generic online learning framework for hierarchical Bayesian models with maxmargin posterior regularization. We show that BayesPA subsumes the standard online PassiveAggressive (PA) learning and extends naturally to incor ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract We present online Bayesian PassiveAggressive (BayesPA) learning, a generic online learning framework for hierarchical Bayesian models with maxmargin posterior regularization. We show that BayesPA subsumes the standard online PassiveAggressive (PA) learning and extends naturally to incorporate latent variables for both parametric and nonparametric Bayesian inference, therefore providing great flexibility for explorative analysis. As an important example, we apply BayesPA to topic modeling and derive efficient online learning algorithms for maxmargin topic models. We further develop nonparametric BayesPA topic models to infer the unknown number of topics in an online manner. Experimental results on 20newsgroups and a large Wikipedia multilabel dataset (with 1.1 millions of training documents and 0.9 million of unique terms in the vocabulary) show that our approaches significantly improve time efficiency while achieving comparable accuracy with the corresponding batch algorithms.
MaxMargin Majority Voting for Learning from Crowds
"... Abstract Learningfromcrowds aims to design proper aggregation strategies to infer the unknown true labels from the noisy labels provided by ordinary web workers. This paper presents maxmargin majority voting (M 3 V) to improve the discriminative ability of majority voting and further presents a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract Learningfromcrowds aims to design proper aggregation strategies to infer the unknown true labels from the noisy labels provided by ordinary web workers. This paper presents maxmargin majority voting (M 3 V) to improve the discriminative ability of majority voting and further presents a Bayesian generalization to incorporate the flexibility of generative methods on modeling noisy observations with worker confusion matrices. We formulate the joint learning as a regularized Bayesian inference problem, where the posterior regularization is derived by maximizing the margin between the aggregated score of a potential true label and that of any alternative label. Our Bayesian model naturally covers the DawidSkene estimator and M 3 V. Empirical results demonstrate that our methods are competitive, often achieving better results than stateoftheart estimators.
Probit Normal Correlated Topic Model
 Open Journal of Statistics
, 2014
"... The logistic normal distribution has recently been adapted via the transformation of multivariate Gaussian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated to ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
The logistic normal distribution has recently been adapted via the transformation of multivariate Gaussian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated topical structures. Our use of the probit model in the context of topic discovery is novel, as many authors have so far concentrated solely of the logistic model partly due to the formidable inefficiency of the multinomial probit model even in the case of very small topical spaces. We herein circumvent the inefficiency of multinomial probit estimation by using an adaptation of the diagonal orthant multinomial probit in the topic models context, resulting in the ability of our topic modeling scheme to handle corpuses with a large number of latent topics. An additional and very important benefit of our method lies in the fact that unlike with the logistic normal model whose nonconjugacy leads to the need for sophisticated sampling schemes, our approach exploits the natural conjugacy inherent in the auxiliary formulation of the probit model to achieve greater simplicity. The application of our proposed scheme to a wellknown Associated Press corpus not only helps discover a large number of meaningful topics but also reveals the capturing of compel
Smallvariance Asymptotics for Dirichlet Process Mixtures of SVMs
"... Infinite SVM (iSVM) is a Dirichlet process (DP) mixture of largemargin classifiers. Though flexible in learning nonlinear classifiers and discovering latent clustering structures, iSVM has a difficult inference task and existing methods could hinder its applicability to largescale problems. This p ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Infinite SVM (iSVM) is a Dirichlet process (DP) mixture of largemargin classifiers. Though flexible in learning nonlinear classifiers and discovering latent clustering structures, iSVM has a difficult inference task and existing methods could hinder its applicability to largescale problems. This paper presents a smallvariance asymptotic analysis to derive a simple and efficient algorithm, which monotonically optimizes a maxmargin DPmeans (M 2 DPM) problem, an extension of DPmeans for both predictive learning and descriptive clustering. Our analysis is built on Gibbs infinite SVMs, an alternative DP mixture of largemargin machines, which admits a partially collapsed Gibbs sampler without truncation by exploring data augmentation techniques. Experimental results show that M 2 DPM runs much faster than similar algorithms without sacrificing prediction accuracies.
Sparse Supervised Topic Model: Midterm Report
"... Abstract In this paper we propose the sparse supervised topic model (SSTM), a graphical model that learns topic structures of a given document collection and also a sparse linear prediction model for response vairables associated with documents. Our model jointly learns the topics and the classifie ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract In this paper we propose the sparse supervised topic model (SSTM), a graphical model that learns topic structures of a given document collection and also a sparse linear prediction model for response vairables associated with documents. Our model jointly learns the topics and the classifier and encourages a sparse classifier by concentrating all the relevant information for prediction into a small set of topics. Experimental results show that our proposed SSTM model has good interpretability on both classification and regression tasks while still achieves reasonable performance in terms of prediction accuracy.
Birds of a Feather Linked Together: A Discriminative Topic Model using Linkbased Priors
"... Abstract A wide range of applications, from social media to scientific literature analysis, involve graphs in which documents are connected by links. We introduce a topic model for link prediction based on the intuition that linked documents will tend to have similar topic distributions, integratin ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract A wide range of applications, from social media to scientific literature analysis, involve graphs in which documents are connected by links. We introduce a topic model for link prediction based on the intuition that linked documents will tend to have similar topic distributions, integrating a maxmargin learning criterion and lexical term weights in the loss function. We validate our approach on the tweets from 2,000 Sina Weibo users and evaluate our model's reconstruction of the social network.
Linear Time Samplers for Supervised Topic Models using Compositional Proposals
"... Topic models are effective probabilistic tools for processing large collections of unstructured data. With the exponential growth of modern industrial data, and consequentially also with our ambition to explore much bigger models, there is a real pressing need to significantly scale up topic modeli ..."
Abstract
 Add to MetaCart
(Show Context)
Topic models are effective probabilistic tools for processing large collections of unstructured data. With the exponential growth of modern industrial data, and consequentially also with our ambition to explore much bigger models, there is a real pressing need to significantly scale up topic modeling algorithms, which has been taken up in lots of previous works, culminating in the recent fast Markov chain Monte Carlo sampling algorithms in [10, 22] for the unsupervised latent Dirichlet allocation (LDA) formulations. In this work we extend the recent sampling advances for unsupervised LDA models to supervised tasks. We focus on the Gibbs MedLDA model [26] that is able to simultaneously discover latent structures and make accurate predictions. By combining a set of sampling techniques we are able to reduce the O(K3 + DK2 + DN̄K) complexity in [26] to O(DK +DN̄) when there are K topics and D documents with average length N ̄. To our best knowledge, this is the first linear time sampling algorithm for supervised topic models. Our algorithm requires minimal modifications to incorporate most loss functions in a variety of supervised tasks, and we observe in our experiments an order of magnitude speedup over the current stateoftheart implementation, while achieving similar prediction performances. The opensource C++ implementation of the proposed algorithm is available at
Sometimes Average is Best: The Importance of Averaging for Prediction using MCMC Inference in Topic Modeling VietAn Nguyen
"... Markov chain Monte Carlo (MCMC) approximates the posterior distribution of latent variable models by generating many samples and averaging over them. In practice, however, it is often more convenient to cut corners, using only a single sample or following a suboptimal averaging strategy. We system ..."
Abstract
 Add to MetaCart
(Show Context)
Markov chain Monte Carlo (MCMC) approximates the posterior distribution of latent variable models by generating many samples and averaging over them. In practice, however, it is often more convenient to cut corners, using only a single sample or following a suboptimal averaging strategy. We systematically study different strategies for averaging MCMC samples and show empirically that averaging properly leads to significant improvements in prediction. 1
Dropout Training for Support Vector Machines
"... Dropout and other feature noising schemes have shown promising results in controlling overfitting by artificially corrupting the training data. Though extensive theoretical and empirical studies have been performed for generalized linear models, little work has been done for support vector machine ..."
Abstract
 Add to MetaCart
Dropout and other feature noising schemes have shown promising results in controlling overfitting by artificially corrupting the training data. Though extensive theoretical and empirical studies have been performed for generalized linear models, little work has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for linear SVMs. To deal with the intractable expectation of the nonsmooth hinge loss under corrupting distributions, we develop an iteratively reweighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a reweighted least square problem, where the reweights have closedform solutions. The similar ideas are applied to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of linear SVMs.