Results 1 -
8 of
8
An Input Output HMM Architecture
- ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 1995
"... We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm ..."
Abstract
-
Cited by 97 (14 self)
- Add to MetaCart
We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation.
Hierarchical Recurrent Neural Networks for Long-Term Dependencies
, 1996
"... We have already shown that extracting long-term dependencies from sequential data is difficult, both for deterministic dynamical systems such as recurrent networks, and probabilistic models such as hidden Markov models (HMMs) or input/output hidden Markov models (IOHMMs). In practice, to avoid t ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
We have already shown that extracting long-term dependencies from sequential data is difficult, both for deterministic dynamical systems such as recurrent networks, and probabilistic models such as hidden Markov models (HMMs) or input/output hidden Markov models (IOHMMs). In practice, to avoid this problem, researchers have used domain specific a-priori knowledge to give meaning to the hidden or state variables representing past context. In this paper, we propose to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically. This implies that long-term dependencies are represented by variables with a long time scale. This principle is applied to a recurrent network which includes delays and multiple time scales. Experiments confirm the advantages of such structures. A similar approach is proposed for HMMs and IOHMMs. 1 Introduction Learning from examples basically amounts to identifying the relations between random v...
Diffusion of Credit in Markovian Models
, 1995
"... This paper studies the problem of diffusion in Markovian models, such as hidden Markov models (HMMs) and how it makes very difficult the task of learning of long-term dependencies in sequences. Using results from Markov chain theory, we show that the problem of diffusion is reduced if the transi ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
This paper studies the problem of diffusion in Markovian models, such as hidden Markov models (HMMs) and how it makes very difficult the task of learning of long-term dependencies in sequences. Using results from Markov chain theory, we show that the problem of diffusion is reduced if the transition probabilities approach 0 or 1. Under this condition, standard HMMs have very limited modeling capabilities, but input/output HMMs can still perform interesting computations. 1 Introduction This paper presents an important new element in our research on the problem of learning long-term dependencies in sequences. In our previous work [4] we found theoretical reasons for the difficulty in training recurrent networks (or more generally parametric non-linear dynamical systems) to learn long-term dependencies. The main result stated that either long-term storing or gradient propagation would be harmed, depending on whether the norm of the Jacobian of the state to state function was grea...
An EM Approach to Grammatical Inference: Input/Output HMMs
- Proceedings of the 12th IAPR Intl. Conf. on Pattern Recognition, IEEE Computer
, 1994
"... We propose a modular recurrent connectionist architecture for adaptive temporal processing. The model is given a probabilistic interpretation and is trained using the EM algorithm. This model can also be seen as an Input/Output Hidden Markov Model. The focus of this paper is on sequence classificati ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We propose a modular recurrent connectionist architecture for adaptive temporal processing. The model is given a probabilistic interpretation and is trained using the EM algorithm. This model can also be seen as an Input/Output Hidden Markov Model. The focus of this paper is on sequence classification tasks. We demonstrate that EM supervised learning is well suited for solving grammatical inference problems. Experimental benchmark results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization. 1 Introduction Challenging learning tasks, such as those related to language, can in principle be approached with recurrent neural networks. Recurrent networks can store and retrieve information in a flexible way. In particular, dynamical attractors can be used to implement reliable long-term memories. However, practical difficulties have been reported in training recurrent networks to perform tasks involving long-term dependencies. A ...
A Neural Model for Multi-Expert Architectures
, 2002
"... We present a generalization of conventional artificial neural networks that allows for a functional equivalence to multi-expert systems. The new model provides an architectural freedom going beyond existing multi-expert models and an integrarive formalism to compare and combine various techniques of ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We present a generalization of conventional artificial neural networks that allows for a functional equivalence to multi-expert systems. The new model provides an architectural freedom going beyond existing multi-expert models and an integrarive formalism to compare and combine various techniques of learning. (We consider gradient, EM, reinforcement, and unsupervised learn- ing.) Its uniform representation aims at a simple netic encoding and evolutionary structure optimization of multi-expert systems. This paper contains a detailed description of the model and learning rules, empirically validates its functionality, and discusses future perspec- tives.
Time Series Analysis And Prediction Using Recurrent Gated Experts
, 1996
"... A recurrent version of the Gated Experts architecture (GE) as defined in [Weigend et al., 1995] using recurrent Artificial Neural Networks inside both gate and expert networks is described in this thesis. The background in time series analysis and prediction and Artificial Neural Networks is present ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A recurrent version of the Gated Experts architecture (GE) as defined in [Weigend et al., 1995] using recurrent Artificial Neural Networks inside both gate and expert networks is described in this thesis. The background in time series analysis and prediction and Artificial Neural Networks is presented and an overview of related architectures is given. The architecture is evaluated using a computer generated time series generated by a structured dynamical system and compared with the non-recurrent version. It has been shown, that the prediction accuracy of recurrent and non-recurrent GEs is similar and that the recurrent architecture could find a significantly smaller representation for the given time series.
unknown title
, 2002
"... We present a generalization of conventional artificial neural networks that allows for a functional equivalence to multi-expert systems. The new model provides an architectural freedom going beyond existing multi-expert models and an integrative formalism to compare and combine various techniques of ..."
Abstract
- Add to MetaCart
We present a generalization of conventional artificial neural networks that allows for a functional equivalence to multi-expert systems. The new model provides an architectural freedom going beyond existing multi-expert models and an integrative formalism to compare and combine various techniques of learning. (We consider gradient, EM, reinforcement, and unsupervised learning.) Its uniform representation aims at a simple genetic encoding and evolutionary structure optimization of multi-expert systems. This paper contains a detailed description of the model and learning rules, empirically validates its functionality, and discusses future perspectives. I

