Results 1  10
of
19
The Infinite Hidden Markov Model
 Machine Learning
, 2002
"... We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. Th ..."
Abstract

Cited by 637 (41 self)
 Add to MetaCart
We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying statetransition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infiniteconsider, for example, symbols being possible words appearing in English text.
Learning Dynamic Bayesian Networks
 In Adaptive Processing of Sequences and Data Structures, Lecture Notes in Artificial Intelligence
, 1998
"... Suppose we wish to build a model of data from a finite sequence of ordered observations, {Y1, Y2,..., Yt}. In most realistic scenarios, from modeling stock prices to physiological data, the observations are not related deterministically. Furthermore, there is added uncertainty resulting from the li ..."
Abstract

Cited by 166 (0 self)
 Add to MetaCart
(Show Context)
Suppose we wish to build a model of data from a finite sequence of ordered observations, {Y1, Y2,..., Yt}. In most realistic scenarios, from modeling stock prices to physiological data, the observations are not related deterministically. Furthermore, there is added uncertainty resulting from the limited size of our data set and any mismatch between our model and the true process. Probability theory provides a powerful tool for expressing both randomness and uncertainty in our model [23]. We can express the uncertainty in our prediction of the future outcome Yt+l via a probability density P(Yt+llY1,..., Yt). Such a probability density can then be used to make point predictions, define error bars, or make decisions that are expected to minimize some loss function. This chapter presents a probabilistic framework for learning models of temporal data. We express these models using the Bayesian network formalism (a.k.a. probabilistic graphical models or belief networks)a marriage of probability theory and graph theory in which dependencies between variables are expressed graphically. The graph not only allows the user to understand which variables
Sensing and Modeling Human Networks
 Ph. D. Thesis, Program in Media Arts and Sciences, Massachusetts Institute of Technology
, 2003
"... ..."
(Show Context)
Augmented Statistical Models for Classifying Sequence Data
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two journal articles [36,68], two workshop papers [35,67] and a technical report [65]. The length of this thesis including appendices, bibliography, footnotes, tables and equations is approximately 60,000 words. This thesis contains 27 figures and 20 tables. i
Product of Gaussians for speech recognition
 Computer Speech & Language
, 2003
"... 1 Introduction Mixture of Gaussians (MoG) are commonly used as the state representation in hidden Markov model (HMM) based speech recognition. These Gaussian mixture models are easy to train using expectation maximisation (EM) techniques [4] and are able to approximate any distribution given a suffi ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
1 Introduction Mixture of Gaussians (MoG) are commonly used as the state representation in hidden Markov model (HMM) based speech recognition. These Gaussian mixture models are easy to train using expectation maximisation (EM) techniques [4] and are able to approximate any distribution given a sufficient number of components. However, only a limited number of parameters can be effectively trained given a finite quantity of training data. This limitation restricts the ability of MoG systems to model highly complex distributions. A range of distributed representations have been developed to overcome this problem. These distributed representations may be split into two basic forms. The first assumes that the sources are asynchronous. The second assumes that the sources are synchronous. o o ot1 t t+1 q q qt1 t t+1
Temporal Pattern Recognition in Noisy Nonstationary Time Series Based on Quantization into Symbolic Streams: Lessons Learned from Financial Volatility Trading
 URL http://citeseer.nj.nec.com/tino00temporal.html. (URL accessed on March 30
, 2000
"... In this paper we investigate the potential of the analysis of noisy nonstationary time series by quantizing it into streams of discrete symbols and applying finitememory symbolic predictors. The main argument is that careful quantization can reduce the noise in the time series to make model esti ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
In this paper we investigate the potential of the analysis of noisy nonstationary time series by quantizing it into streams of discrete symbols and applying finitememory symbolic predictors. The main argument is that careful quantization can reduce the noise in the time series to make model estimation more amenable given limited numbers of samples that can be drawn due to the nonstationarity in the time series. As a main application area we study the use of such an analysis in a realistic setting involving financial forecasting and trading. In particular, using historical data, we simulate the trading of straddles on the financial indexes DAX and FTSE 100 on a daily basis, based on predictions of the daily volatility differences in the underlying indexes. We propose a parametric, datadriven quantization scheme which transforms temporal patterns in the series of daily volatility changes into grammatical and statistical patterns in the corresponding symbolic streams. As sy...
Techniques for modelling Phonological Processes in Automatic Speech Recognition
, 2001
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does not exceed 29,500 words and includes no more than 40 figures. 1 Systems which automatically transcribe carefully dictated speech are now commercially available, but their performance degrades dramatically when the speaking style of users becomes more relaxed or conversational. This dissertation focuses on techniques that aim to improve the robustness of statistical speech transcription systems to conversational speaking styles. The dissertation shows first that the performance degradation occuring as speech becomes more conversational is severe and is partially attributable to differences in the acoustic realizations of sentences. Hypothesizing that the quantifiably wider range of
A Symbolic Dynamics Approach to Volatility Prediction
 IN COMPUTATIONAL FINANCE, (PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL FINANCE), LEONARD N. STERN SCHOOL OF BUSINESS
, 1999
"... We consider the problem of predicting the direction of daily volatility changes in the Dow Jones Industrial Average (DJIA). This is accomplished by quantizing a series of historic volatility changes into a symbolic stream over 2 or 4 symbols. We compare predictive performance of the classical fixed ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We consider the problem of predicting the direction of daily volatility changes in the Dow Jones Industrial Average (DJIA). This is accomplished by quantizing a series of historic volatility changes into a symbolic stream over 2 or 4 symbols. We compare predictive performance of the classical fixedorder Markov models with that of a novel approach to variable memory length prediction (called prediction fractal machine, or PFM) which is able to select very specific deep prediction contexts (whenever there is a sufficient support for such contexts in the training data). We learn that daily volatility changes of the DJIA only exhibit rather shallow finite memory structure. On the other hand, a careful selection of quantization cut values can strongly enhance predictive power of symbolic schemes. Results on 12 nonoverlapping epochs of the DJIA strongly suggest that PFMs can outperform both traditional Markov models and (continuousvalued) GARCH models in the task of predicting volatility o...
MixedMemory Markov Models For Automatic Language Identification
 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing
, 2000
"... Automatic language identification (LID) continues to play an integral part in many multilingual speech applications. The most widespread approach to LID is the phonotactic approach, which performs language classification based on the probabilities of phone sequences extracted from the test signal. T ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Automatic language identification (LID) continues to play an integral part in many multilingual speech applications. The most widespread approach to LID is the phonotactic approach, which performs language classification based on the probabilities of phone sequences extracted from the test signal. These probabilities are typically computed using statistical phone ngram models. In this paper we investigate the approximation of these standard ngram models by mixedmemory Markov models with application to both a phonebased and an articulatory featurebased LID system. We demonstrate significant improvements in accuracy with a substantially reduced set of parameters on a 10way language identification task.