Results 1  10
of
96
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 770 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
A Unifying Review of Linear Gaussian Models
, 1999
"... Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observa ..."
Abstract

Cited by 351 (18 self)
 Add to MetaCart
(Show Context)
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.
A Probabilistic Framework for SegmentBased Speech Recognition,
 Proc. HLTNAACL 2003 Workshop on Research Directions in Dialogue Processing.
, 2003
"... ..."
Variational learning for switching statespace models
 Neural Computation
, 1998
"... We introduce a new statistical model for time series which iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time series models  hidden Ma ..."
Abstract

Cited by 173 (5 self)
 Add to MetaCart
(Show Context)
We introduce a new statistical model for time series which iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time series models  hidden Markov models and linear dynamical systems  and is closely related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network (Jacobs et al., 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact Expectation Maximization (EM) algorithm cannot be applied. However, we present a variational approximation that maximizes a lower bound on the log likelihood and makes use of both the forwardbackward recursions for hidden Markov models and the Kalman lter recursions for linear dynamical systems. We tested the algorithm both on artificial data sets and on a natural data set of respiration force from a patient with sleep apnea. The results suggest that variational approximations are a viable method for inference and learning in switching statespace models.
Learning Dynamic Bayesian Networks
 In Adaptive Processing of Sequences and Data Structures, Lecture Notes in Artificial Intelligence
, 1998
"... Suppose we wish to build a model of data from a finite sequence of ordered observations, {Y1, Y2,..., Yt}. In most realistic scenarios, from modeling stock prices to physiological data, the observations are not related deterministically. Furthermore, there is added uncertainty resulting from the li ..."
Abstract

Cited by 166 (0 self)
 Add to MetaCart
(Show Context)
Suppose we wish to build a model of data from a finite sequence of ordered observations, {Y1, Y2,..., Yt}. In most realistic scenarios, from modeling stock prices to physiological data, the observations are not related deterministically. Furthermore, there is added uncertainty resulting from the limited size of our data set and any mismatch between our model and the true process. Probability theory provides a powerful tool for expressing both randomness and uncertainty in our model [23]. We can express the uncertainty in our prediction of the future outcome Yt+l via a probability density P(Yt+llY1,..., Yt). Such a probability density can then be used to make point predictions, define error bars, or make decisions that are expected to minimize some loss function. This chapter presents a probabilistic framework for learning models of temporal data. We express these models using the Bayesian network formalism (a.k.a. probabilistic graphical models or belief networks)a marriage of probability theory and graph theory in which dependencies between variables are expressed graphically. The graph not only allows the user to understand which variables
A Probabilistic Framework For FeatureBased Speech Recognition
, 1996
"... Most current speech recognizers use an observation space which is based on a temporal sequence of "frames" (e.g., Melcepstra). There is another class of recognizer which further processes these frames to produce a segmentbased network, and represents each segment by fixeddimensional &qu ..."
Abstract

Cited by 129 (31 self)
 Add to MetaCart
(Show Context)
Most current speech recognizers use an observation space which is based on a temporal sequence of "frames" (e.g., Melcepstra). There is another class of recognizer which further processes these frames to produce a segmentbased network, and represents each segment by fixeddimensional "features." In such featurebased recognizers the observation space takes the form of a temporal network of feature vectors, so that a single segmentation of an utterance will use a subset of all possible feature vectors. In this work we examine amaximuma posteriori decoding strategy for featurebased recognizers and develop a normalization criterion useful for a segmentbased Viterbi or A* search. We report experimental results for the task of phonetic recognition on the TIMIT corpus where we achieved contextindependent and contextdependent (using diphones) results on the core test set of 64.1% and 69.5% respectively.
Modeling, clustering, and segmenting video with mixtures of dynamic textures
 PAMI
, 2008
"... A dynamic texture is a spatiotemporal generative model for video, which represents video sequences as observations from a linear dynamical system. This work studies the mixture of dynamic textures, a statistical model for an ensemble of video sequences that is sampled from a finite collection of v ..."
Abstract

Cited by 67 (14 self)
 Add to MetaCart
A dynamic texture is a spatiotemporal generative model for video, which represents video sequences as observations from a linear dynamical system. This work studies the mixture of dynamic textures, a statistical model for an ensemble of video sequences that is sampled from a finite collection of visual processes, each of which is a dynamic texture. An expectationmaximization (EM) algorithm is derived for learning the parameters of the model, and the model is related to previous works in linear systems, machine learning, timeseries clustering, control theory, and computer vision. Through experimentation, it is shown that the mixture of dynamic textures is a suitable representation for both the appearance and dynamics of a variety of visual processes that have traditionally been challenging for computer vision (for example, fire, steam, water, vehicle and pedestrian traffic, and so forth). When compared with stateoftheart methods in motion segmentation, including both temporal texture methods and traditional representations (for example, optical flow or other localized motion representations), the mixture of dynamic textures achieves superior performance in the problems of clustering and segmenting video of such processes.
Switching Kalman Filters
, 1998
"... We show how many different variants of Switching Kalman Filter models can be represented in a unified way, leading to a single, generalpurpose inference algorithm. We then show how to find approximate Maximum Likelihood Estimates of the parameters using the EM algorithm, extending previous results ..."
Abstract

Cited by 67 (2 self)
 Add to MetaCart
(Show Context)
We show how many different variants of Switching Kalman Filter models can be represented in a unified way, leading to a single, generalpurpose inference algorithm. We then show how to find approximate Maximum Likelihood Estimates of the parameters using the EM algorithm, extending previous results on learning using EM in the nonswitching case [DRO93, GH96a] and in the switching, but fully observed, case [Ham90]. 1 Introduction Dynamical systems are often assumed to be linear and subject to Gaussian noise. This model, called the Linear Dynamical System (LDS) model, can be defined as x t = A t x t\Gamma1 + v t y t = C t x t +w t where x t is the hidden state variable at time t, y t is the observation at time t, and v t ¸ N(0; Q t ) and w t ¸ N(0; R t ) are independent Gaussian noise sources. Typically the parameters of the model \Theta = f(A t ; C t ; Q t ; R t )g are assumed to be timeinvariant, so that they can be estimated from data using e.g., EM [GH96a]. One of the main adva...
Production models as a structural basis for automatic speech recognition.
 Speech Communication
, 1997
"... ..."