Results 1 - 10
of
72
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have bee ..."
Abstract
-
Cited by 393 (4 self)
- Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying Rao-Blackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
A gentle tutorial on the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models
, 1997
"... We describe the maximum-likelihood parameter estimation problem and how the Expectation-form of the EM algorithm as it is often given in the literature. We then develop the EM parameter estimation procedure for two applications: 1) finding the parameters of a mixture of Gaussian densities, and 2) fi ..."
Abstract
-
Cited by 328 (4 self)
- Add to MetaCart
We describe the maximum-likelihood parameter estimation problem and how the Expectation-form of the EM algorithm as it is often given in the literature. We then develop the EM parameter estimation procedure for two applications: 1) finding the parameters of a mixture of Gaussian densities, and 2) finding the parameters of a hidden Markov model (HMM) (i.e., the Baum-Welch algorithm) for both discrete and Gaussian mixture observation models. We derive the update equations in fairly explicit detail but we do not prove any convergence properties. We try to emphasize intuition rather than mathematical rigor. ii 1 Maximum-likelihood Recall the definition of the maximum-likelihood estimation problem. We have a density function ¢¡¤£¦ ¥ §© ¨ that is governed by the set of parameters § (e.g., might be a set of Gaussians and § could be the means and covariances). We also have a data set of size � , supposedly drawn from this distribution, i.e., ���� � £�������������£��© �. That is, we assume that these data vectors are independent and
The Helmholtz Machine
, 1995
"... Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative model ..."
Abstract
-
Cited by 165 (22 self)
- Add to MetaCart
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical self-supervised learning that may relate to the function of bottom-up and top-down cortical processing pathways.
Information Geometry of the EM and em Algorithms for Neural Networks
- Neural Networks
, 1995
"... In order to realize an input-output relation given by noise-contaminated examples, it is effective to use a stochastic model of neural networks. A model network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the obs ..."
Abstract
-
Cited by 89 (7 self)
- Add to MetaCart
In order to realize an input-output relation given by noise-contaminated examples, it is effective to use a stochastic model of neural networks. A model network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the observed or specified input-output data based on the stochastic model. Two algorithms, the EM - and em-algorithms, have so far been proposed for this purpose. The EM-algorithm is an iterative statistical technique of using the conditional expectation, and the em-algorithm is a geometrical one given by information geometry. The em-algorithm minimizes iteratively the Kullback-Leibler divergence in the manifold of neural networks. These two algorithms are equivalent in most cases. The present paper gives a unified information geometrical framework for studying stochastic models of neural networks, by forcussing on the EM and em algorithms, and proves a condition which guarantees their equ...
Decision templates for multiple classifier fusion: an experimental comparison
- Pattern Recognition
, 2001
"... Multiple classifier fusion may generate more accurate classification than each of the constituent classifiers. Fusion is often based on fixed combination rules like the product and average. Only under strict probabilistic conditions can these rules be justified. We present here a simple rule for ada ..."
Abstract
-
Cited by 77 (7 self)
- Add to MetaCart
Multiple classifier fusion may generate more accurate classification than each of the constituent classifiers. Fusion is often based on fixed combination rules like the product and average. Only under strict probabilistic conditions can these rules be justified. We present here a simple rule for adapting the class combiner to the application. c decision templates (one per class) are estimated with the same training set that is used for the set of classifiers. These templates are then matched to the decision profile of new incoming objects by some similarity measure. We compare 11 versions of our model with 14 other techniques for classifier fusion on the Satimage and Phoneme datasets from the database ELENA. Our results show that decision templates based on integral type measures of similarity are superior to the other schemes on both data sets.
Nonlinear Gated Experts for Time Series: Discovering Regimes and Avoiding Overfitting
, 1995
"... this paper: ftp://ftp.cs.colorado.edu/pub/Time-Series/MyPapers/experts.ps.Z, ..."
Abstract
-
Cited by 74 (5 self)
- Add to MetaCart
this paper: ftp://ftp.cs.colorado.edu/pub/Time-Series/MyPapers/experts.ps.Z,
Designing Classifier Fusion Systems By Genetic Algorithms
- IEEE Transactions On Evolutionary Computation
, 2000
"... We suggest two simple ways to use a genetic algorithm (GA) to design a multiple classifier system. The first GA version selects disjoint feature subsets to be used by the individual classifiers, whereas the second version selects (possibly) overlapping feature subsets and also the types of the indiv ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
We suggest two simple ways to use a genetic algorithm (GA) to design a multiple classifier system. The first GA version selects disjoint feature subsets to be used by the individual classifiers, whereas the second version selects (possibly) overlapping feature subsets and also the types of the individual classifiers. The two GAs have been tested with four real data sets: Heart, Satimage, Letters, and Forensic glasses (10-fold cross-validation, except for Satimage where we used only two splits). We used 3-classifier systems and basic types of individual classi ers (the linear and quadratic discriminant classifiers and the logistic classifier). The multiple classifier systems designed with the two GAs were compared against classi ers using: (a) all features; (b) the best feature subset found by the sequential backward selection (SBS) method; and (c), the best feature subset found by a GA (individual classifier!). We found that: (1) the multiple classifier system derived through the GA, Version 2, ...
BYY Harmony Learning, Independent State Space, and Generalized APT Financial Analyses
, 2001
"... First, the relationship between factor analysis (FA) and the well-known arbitrage pricing theory (APT) for financial market has been discussed comparatively, with a number of to-be-improved problems listed. An overview has been made from a unified perspective on the related studies in the literature ..."
Abstract
-
Cited by 22 (19 self)
- Add to MetaCart
First, the relationship between factor analysis (FA) and the well-known arbitrage pricing theory (APT) for financial market has been discussed comparatively, with a number of to-be-improved problems listed. An overview has been made from a unified perspective on the related studies in the literatures of statistics, control theory, signal processing, and neural networks. Second, we introduce the fundamentals of the Bayesian Ying Yang (BYY) system and the harmony learning principle which has been systematically developed in past several years as a unified statistical framework for parameter learning, regularization and model selection, in both nontemporal and temporal stochastic environments. We further show that a specific case of the framework, called BYY independent state space (ISS) system, provides a general guide for systematically tackling various FA related learning tasks and the above to-be-improved problems for the APT analyses. Third, on various specific cases of the BYY ISS s...
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)

