Results 1  10
of
43
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 770 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification
"... Supervised topic models utilize document’s side information for discovering predictive low dimensional representations of documents; and existing models apply likelihoodbased estimation. In this paper, we present a maxmargin supervised topic model for both continuous and categorical response variab ..."
Abstract

Cited by 93 (27 self)
 Add to MetaCart
Supervised topic models utilize document’s side information for discovering predictive low dimensional representations of documents; and existing models apply likelihoodbased estimation. In this paper, we present a maxmargin supervised topic model for both continuous and categorical response variables. Our approach, the maximum entropy discrimination latent Dirichlet allocation (MedLDA), utilizes the maxmargin principle to train supervised topic models and estimate predictive topic representations that are arguably more suitable for prediction. We develop efficient variational methods for posterior inference and demonstrate qualitatively and quantitatively the advantages of MedLDA over likelihoodbased topic models on movie review and 20 Newsgroups data sets. 1.
A discriminative framework for modeling object classes
 In Proc. IEEE CVPR
, 2005
"... Here we explore a discriminative learning method on underlying generative models for the purpose of discriminating between object categories. Visual recognition algorithms learn models from a set of training examples. Generative models learn their representations by considering data from a single cl ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
(Show Context)
Here we explore a discriminative learning method on underlying generative models for the purpose of discriminating between object categories. Visual recognition algorithms learn models from a set of training examples. Generative models learn their representations by considering data from a single class. Generative models are popular in computer vision for many reasons, including their ability to elegantly incorporate prior knowledge and to handle correspondences between object parts and detected features. However, generative models are often inferior to discriminative models during classification tasks. We study a discriminative approach to learning object categories which maintains the representational power of generative learning, but trains the generative models in a discriminative manner. The discriminatively trained models perform better during classification tasks as a result of selecting discriminative sets of features. We conclude by proposing a multiclass object recognition system which initially trains object classes in a generative manner, identifies subsets of similar classes with high confusion, and finally trains models for these subsets in a discriminative manner to realize gains in classification performance. 1
MaxMargin Nonparametric Latent Feature Models for Link Prediction
"... We present a maxmargin nonparametric latent feature relational model, which unites the ideas of maxmargin learning and Bayesian nonparametrics to discover discriminative latent features for link prediction and automatically infer the unknown latent social dimension. By minimizing a hingeloss usi ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
(Show Context)
We present a maxmargin nonparametric latent feature relational model, which unites the ideas of maxmargin learning and Bayesian nonparametrics to discover discriminative latent features for link prediction and automatically infer the unknown latent social dimension. By minimizing a hingeloss using the linear expectation operator, we can perform posterior inference efficiently without dealing with a highly nonlinear link likelihood function; by using a fullyBayesian formulation, we can avoid tuning regularization constants. Experimental results on real datasets appear to demonstrate the benefits inherited from maxmargin learning and fullyBayesian nonparametric inference. 1.
The Latent Maximum Entropy Principle
 In Proc. of ISIT
, 2002
"... We present an extension to Jaynes' maximum entropy principle that handles latent variables. The principle of latent maximum entropy we propose is di#erent from both Jaynes' maximum entropy principle and maximum likelihood estimation, but often yields better estimates in the presence of h ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
We present an extension to Jaynes' maximum entropy principle that handles latent variables. The principle of latent maximum entropy we propose is di#erent from both Jaynes' maximum entropy principle and maximum likelihood estimation, but often yields better estimates in the presence of hidden variables and limited training data. We first show that solving for a latent maximum entropy model poses a hard nonlinear constrained optimization problem in general. However, we then show that feasible solutions to this problem can be obtained e#ciently for the special case of loglinear modelswhich forms the basis for an e#cient approximation to the latent maximum entropy principle. We derive an algorithm that combines expectationmaximization with iterative scaling to produce feasible loglinear solutions. This algorithm can be interpreted as an alternating minimization algorithm in the information divergence, and reveals an intimate connection between the latent maximum entropy and maximum likelihood principles.
MaxMargin MinEntropy Models
 AISTATS
, 2012
"... We propose a novel family of discriminative lvms, called maxmargin minentropy (m3e) models, that predicts the output by minimizing the Rényi entropy [18] of the corresponding generalized distribuhal00773602, ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
We propose a novel family of discriminative lvms, called maxmargin minentropy (m3e) models, that predicts the output by minimizing the Rényi entropy [18] of the corresponding generalized distribuhal00773602,
Bayesian inference with posterior regularization and applications to infinite latent svms
 In arXiv:1210.1766v2
, 2013
"... Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors affect posterior distributions through Bayes ’ rule, imposing posterior regularization is arguably mo ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors affect posterior distributions through Bayes ’ rule, imposing posterior regularization is arguably more direct and in some cases more natural and general. In this paper, we present regularized Bayesian inference (RegBayes), a novel computational framework that performs posterior inference with a regularization term on the desired postdata posterior distribution under an information theoretical formulation. RegBayes is more flexible than the procedure that elicits expert knowledge via priors, and it covers both directed Bayesian networks and undirected Markov networks. When the regularization is induced from a linear operator on the posterior distributions, such as the expectation operator, we present a general convexanalysis theorem to characterize the solution of RegBayes. Furthermore, we present two concrete examples of RegBayes, infinite latent support vector machines (iLSVM) and multitask infinite latent support vector machines (MTiLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for dis
Gibbs maxmargin topic models with data augmentation
 Journal of Machine Learning Research (JMLR
"... Maxmargin learning is a powerful approach to building classifiers and structured output predictors. Recent work on maxmargin supervised topic models has successfully integrated it with Bayesian topic models to discover discriminative latent semantic structures and make accurate predictions for uns ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
Maxmargin learning is a powerful approach to building classifiers and structured output predictors. Recent work on maxmargin supervised topic models has successfully integrated it with Bayesian topic models to discover discriminative latent semantic structures and make accurate predictions for unseen testing data. However, the resulting learning problems are usually hard to solve because of the nonsmoothness of the margin loss. Existing approaches to building maxmargin supervised topic models rely on an iterative procedure to solve multiple latent SVM subproblems with additional meanfield assumptions on the desired posterior distributions. This paper presents an alternative approach by defining a new maxmargin loss. Namely, we present Gibbs maxmargin supervised topic models, a latent variable Gibbs classifier to discover hidden topic representations for various tasks, including classification, regression and multitask learning. Gibbs maxmargin supervised topic models minimize an expected margin loss, which is an upper bound of the existing margin loss derived from an expected prediction rule. By introducing augmented variables and integrating out the Dirichlet variables analytically by conjugacy, we develop simple
Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive Bayes
 INTERNATIONAL JOURNAL OF APPROXIMATE REASONING
, 2006
"... Most of the Bayesian networkbased classifiers are usually only able to handle discrete variables. However, most realworld domains involve continuous variables. A common practice to deal with continuous variables is to discretize them, with a subsequent loss of information. This work shows how disc ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Most of the Bayesian networkbased classifiers are usually only able to handle discrete variables. However, most realworld domains involve continuous variables. A common practice to deal with continuous variables is to discretize them, with a subsequent loss of information. This work shows how discrete classifier induction algorithms can be adapted to the conditional Gaussian network paradigm to deal with continuous variables without discretizing them. In addition, three novel classifier induction algorithms and two new propositions about mutual information are introduced. The classifier induction algorithms presented are ordered and grouped according to their structural complexity: naive Bayes, tree augmented naive Bayes, kdependence Bayesian classifiers and semi naive Bayes. All the classifier induction algorithms are empirically evaluated using predictive accuracy, and they are compared to linear discriminant analysis, as a continuous classic statistical benchmark classifier. Besides, the accuracies for a set of stateoftheart classifiers are included in order to justify the use of linear discriminant analysis as the benchmark algorithm. In order to understand the behavior of the conditional Gaussian networkbased classifiers better, the results include biasvariance decomposition of the expected misclassification rate. The study suggests that semi naive Bayes structure based classifiers and, especially, the novel wrapper condensed semi naive Bayes backward, outperform the behavior of the rest of the presented classifiers. They also obtain quite competitive results compared to the stateoftheart algorithms included.
Statistical Imitative Learning from Perceptual Data
 PROC. ICDL 02
, 2002
"... Imitative learning has recently piqued the interest of various fields including neuroscience, cognitive science and robotics. In computational behavior modeling and development, it promises an accessible framework for rapidly forming behavior models without tedious supervision or reinforcement. Give ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Imitative learning has recently piqued the interest of various fields including neuroscience, cognitive science and robotics. In computational behavior modeling and development, it promises an accessible framework for rapidly forming behavior models without tedious supervision or reinforcement. Given the availability of lowcost wearable sensors, the robustness of realtime perception algorithms and the feasibility of archiving large amounts of audiovisual data, it is possible to unobtrusively archive the daily activities of a human teacher and his responses to external stimuli. We combine this data acquisition/representation process with statistical learning machinery (hidden Markov models) as well as discriminative estimation algorithms to form a behavioral model of a human teacher directly from the data set. The resulting system learns audiovisual interactive behavior from the human and his environment to produce an interactive autonomous agent. The agent subsequently exhibits simple audiovisual behaviors that appear coupled to realworld test stimuli.