Results 1  10
of
116
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 758 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Survey of clustering algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract

Cited by 483 (4 self)
 Add to MetaCart
(Show Context)
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
ContextBased Vision System for Place and Object Recognition
, 2003
"... While navigating in an environment, a vision system has' to be able to recognize where it is' and what the main objects' in the scene are. In this paper we present a contextbased vision system for place and object recognition. The goal is' to identify familiar locations' (e ..."
Abstract

Cited by 314 (8 self)
 Add to MetaCart
(Show Context)
While navigating in an environment, a vision system has' to be able to recognize where it is' and what the main objects' in the scene are. In this paper we present a contextbased vision system for place and object recognition. The goal is' to identify familiar locations' (e.g., office 610, conference room 941, Main Street), to categorize new environments' (office, corridor, street) and to use that information to provide contextualpriors for object recognition (e.g., table, chair, car, computeD. We present a lowdimensional global image representation that provides relevant information for place recognition and categorization, and how such contextual information introduces strong priors' that simplify object recognition. We have trained the system to recognize over 60 locations (indoors' and outdoors') and to suggest the presence and locations' of more than 20 different object types. The algorithm has been integrated into a mobile system that provides realtime feedback to the user. 1This work was sponsored by the Air Force under Air Force Contract F1962800C0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the U.S. Government.
Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction
, 1998
"... We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum... ..."
Abstract

Cited by 79 (0 self)
 Add to MetaCart
We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum...
Gradient Flow in Recurrent Nets: the Difficulty of Learning LongTerm Dependencies
, 2001
"... Recurrent networks (crossreference Chapter 12) can, in principle, use their feedback connections to store representations of recent input events in the form of activations. The most widely used algorithms for learning what to put in shortterm memory, however, take too much time to be feasible or d ..."
Abstract

Cited by 77 (24 self)
 Add to MetaCart
Recurrent networks (crossreference Chapter 12) can, in principle, use their feedback connections to store representations of recent input events in the form of activations. The most widely used algorithms for learning what to put in shortterm memory, however, take too much time to be feasible or do not work well at all, especially when minimal time lags between inputs and corresponding teacher signals are long. Although theoretically fascinating, they do not provide clear practical advantages over, say, backprop in feedforward networks with limited time windows (see crossreference Chapters 11 and 12). With conventional "algorithms based on the computation of the complete gradient", such as "BackPropagation Through Time" (BPTT, e.g., [22, 27, 26]) or "RealTime Recurrent Learning" (RTRL, e.g., [21]) error signals "flowing backwards in time" tend to either (1) blow up or (2) vanish: the temporal evolution of the backpropagated error ex
Graphical models and automatic speech recognition
 Mathematical Foundations of Speech and Language Processing
, 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract

Cited by 77 (15 self)
 Add to MetaCart
(Show Context)
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic, pronunciation, and languagemodeling levels. A number of speech recognition techniques born directly out of the graphicalmodels paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov modelbased speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.
A Novel Connectionist System for Unconstrained Handwriting Recognition
, 2008
"... Recognising lines of unconstrained handwritten text is a challenging task. The difficulty of segmenting cursive or overlapping characters, combined with the need to exploit surrounding context, has led to low recognition rates for even the best current recognisers. Most recent progress in the field ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
Recognising lines of unconstrained handwritten text is a challenging task. The difficulty of segmenting cursive or overlapping characters, combined with the need to exploit surrounding context, has led to low recognition rates for even the best current recognisers. Most recent progress in the field has been made either through improved preprocessing, or through advances in language modelling. Relatively little work has been done on the basic recognition algorithms. Indeed, most systems rely on the same hidden Markov models that have been used for decades in speech and handwriting recognition, despite their wellknown shortcomings. This paper proposes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labelling tasks where the data is hard to segment and contains long range, bidirectional interdependencies. In experiments on two large unconstrained handwriting databases, our approach achieves word recognition accuracies of 79.7 % on online data and 74.1 % on offline data, significantly outperforming a stateoftheart HMMbased system. In addition, we demonstrate the network’s robustness to lexicon size, measure the individual influence of its hidden layers, and analyse its use of context. Lastly we provide an in depth discussion of the differences between the network and HMMs, suggesting reasons for the network’s superior performance.
Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
 In Proceedings of the International Conference on Machine Learning, ICML 2006
, 2006
"... Many realworld sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or subword units. Recurrent neural networks (RNNs) are powerful sequence learners that would se ..."
Abstract

Cited by 53 (20 self)
 Add to MetaCart
Many realworld sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or subword units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require presegmented training data, and postprocessing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMMRNN. 1.
Unsupervised Language Acquisition: Theory and Practice
, 2001
"... In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the socalled Argument from the Poverty of the Stimulus advanced in favour of the p ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the socalled Argument from the Poverty of the Stimulus advanced in favour of the proposition that humans have languagespecific innate knowledge. I start by examining an a priori argument based on Gold's theorem, that purports to prove that natural languages cannot be learned, and some formal issues related to the choice of statistical grammars rather than symbolic grammars. I present three novel algorithms for learning various parts of natural languages: first, an algorithm for the induction of syntactic categories from unlabelled text using distributional information, that can deal with ambiguous and rare words; secondly, a set of algorithms for learning morphological processes in a variety of languages, including languages such as Arabic with nonconcatenative morphology; thirdly an algorithm for the unsupervised induction of a contextfree grammar from tagged text. I carefully examine the interaction between the various components, and show how these algorithms can form the basis for a empiricist model of language acquisition. I therefore conclude that the Argument from the Poverty of the Stimulus is unsupported by the evidence.
Hatching by Example: a Statistical Approach
, 2002
"... We present a new approach to synthetic (computeraided) drawing with patches of strokes. Grouped strokes convey the local intensity level that is desired in drawing. The key point of our approach is learning by example: the system does not know a priori the distribution of the strokes. Instead, by a ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
We present a new approach to synthetic (computeraided) drawing with patches of strokes. Grouped strokes convey the local intensity level that is desired in drawing. The key point of our approach is learning by example: the system does not know a priori the distribution of the strokes. Instead, by analyzing a sample (training) patch of strokes, our system is able to synthesize freely an arbitrary sequence of strokes that "looks like" the given sample. Strokes are considered as parametrical curves represented by a vector of random variables following a Markovian distribution. Our method is based on Shannon's Ngram approach and is a direct extension of Efros's texture synthesis models [EL99; EF01]. Nevertheless, one major difference between our method and traditional texture synthesis is the use of such curves as a basic element instead of pixels. We define a statistical metric for comparison between different patches containing various layouts of strokes. We hope that our method performs a first step towards capturing a very difficult notion of style in drawing  hatching style in our case. We illustrate our method by varied examples, ranging from typical hatching in traditional drawing to highly heterogeneous sets of strokes.