| Y. Bengio and P. Frasconi, "Input-Output HMMs for sequence processing," in IEEE Trans. Neural Networks,7:5, pp. 1231-1249, September 1996. |
....act like finite memory machines. This argumentation implies that through the initialization recurrent networks have an architectural bias towards finite memory machines. We add a remark on recurrent neural networks used for the approximation of probability distributions as proposed for example in [5]. Definition 3.10 A probabilistic recurrent network computes a function of the form h where h : R fx 2 R i=1 x i = 1; x i 0g, and is of the form f(x; u) W 1 x W 2 u b) where W 1 2 R and W 2 2 R are matrices, b 2 R , and denotes the ....
....a distribution for the next symbol given a sequence s if the output components of the network are interpreted as a probability distribution over the alphabet. Usually, h consists of a linear function combined possibly by component wise nonlinear transformation and followed by normalization. In [5], the outputs of f are normalized, too, such that the intermediate values can be interpreted as a probability distribution on a finite set of hidden states and training can be performed for example with a generalized EM algorithm [27] Note that the above approximation results can be transferred ....
[Article contains additional citation context not shown here]
Y. Bengio and P. Frasconi. Input/output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231-1249, 1996.
....act like finite memory machines. This argumentation implies that through the initialization recurrent networks have an architectural bias towards finite memory machines. We add a remark on recurrent neural networks used for the approximation of probability distributions as proposed for example in [5]. k k denotes the Euclidian metric. 9 Definition 3.10 A probabilistic recurrent network computes a function of the form h where h : R fx 2 R P n 3 i=1 x i = 1; x i 0g, and is of the form f(x; u) W 1 x W 2 u b) where W 1 2 R and W 2 2 R are ....
....a distribution for the next symbol given a sequence s if the output components of the network are interpreted as a probability distribution over the alphabet. Usually, h consists of a linear function combined possibly by component wise nonlinear transformation and followed by normalization. In [5], the outputs of f are normalized, too, such that the intermediate values can be interpreted as a probability distribution on a finite set of hidden states and training can be performed for example with a generalized EM algorithm [27] Note that the above approximation results can be transferred ....
[Article contains additional citation context not shown here]
Y. Bengio and P. Frasconi. Input/output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231-1249, 1996. 16
....the likelihood of this model. Cacciatore and Nowlan show that this model is capable of controlling plants which contain switching or jump effects more effectively than a standard mixtures of experts model. 2.4. REVIEW OF RELATED LITERATURE 30 The input output HMM (IOHMM) of Bengio and Frasconi [10] is a further generalisation of the mixtures of controllers model. The IOHMM consists of experts which predict the target given the current input , and whose outputs are gated by the outputs of a gating network. The difference with the standard mixtures of experts model lies in the gating ....
....mixture of experts. The aim of this architecture is to factor learning into a temporal component, which is learnt by the gate and a static component, which is learnt by the experts. Modularity is encouraged in both components by the use of the mixtures of experts framework. Bengio and Frasconi [10] described an application of the EM algorithm to learn parameters of IOHMM models. This has intuitive connections with both the EM algorithm for mixtures of experts models and the Baum Welch algorithm [8] of hidden Markov models. Other applications of recurrent mixtures of experts models are in ....
[Article contains additional citation context not shown here]
Bengio, Y. and Frasconi, P. [1996], `Input-output HMMs for sequence processing', IEEE Transactions
.... relatively simple estimate of the probability of the acoustics conditioned upon the overall model f(X N i 1 ; The potential inclusion of a label network at each state, estimating a probability distribution over sound classes, highlights similarities between the HNN and the input output HMM [15]. Another discriminative training criterion is the maximum mutual information (MMI) criterion [8] well described in [138] The objective behind this criterion is to maximise the mutual information between a word sequence model and an acoustic observation sequence associated with it: MMI = ....
Y. Bengio and P. Frasconi. Input/output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231--1249, 1996.
....can only be handled by explicit construction of context dependent models. Some of these assumptions can be relaxed by introducing neural networks to estimate the probability parameters in the HMM. Recently several approaches for combining HMMs and neural networks have been proposed, see e.g. [2, 3, 4, 6, 11, 12, 13]. Here we present a new hybrid called Hidden Neural Networks where the usual HMM probabilities are estimated by small neural networks. In the HNN it is possible to assign up to two networks to each state: 1) a match network estimating the probability that the current observation matches a ....
....and suggestions to this work. The Sanger Centre is supported by the Wellcome Trust. REFERENCES [1] L. Bahl, P. Brown, P. de Souza, and R. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, in Proceedings of ICASSP 86, pp. 49 52, 1986. [2] P. Baldi and Y. Chauvin, Protein modeling with hybrid hidden Markov model neural network architectures, in Proceedings of the 3rd ISMB 95, pp. 39 47, 1995. 3] Y. Bengio, R. De Mori, G. Flammia, and R. Kompe, Global optimization of a neural network hidden Markov model hybrid, IEEE ....
[Article contains additional citation context not shown here]
BENGIO, Y., AND FRASCONI, P. Input/output HMMs for sequence processing. IEEE Transactions on NN 7, 5 (1996), 1231--1249.
....probabilities that are conditional on the input, P (S t jS t Gamma1 ; x t ) In this dynamical generalization of the ME network, the dynamics of the system (specified by the transition probabilities) are adapted in time, resulting in an inhomogeneous Markov chain 5 . The approach taken by (Bengio and Frasconi, 1996) in their input output HMM (IOHMM) suggests modelling P (S t jS t Gamma1 ; x t ) as K separate neural networks, one for each setting of S t . To guarantee that a valid probability transition matrix is defined at each point in the input space, a sum to one constraint (softmax transfer function) is ....
....is ultimately unsatisfactory in that it introduces domain knowledge to overcome deficiencies in the training scheme rather than as prior probabilities in a proper Bayesian treatment. For the more complex IOHMM, the training process becomes harder due to the intractability of the M step. (Bengio and Frasconi, 1996) addressed this problem by applying heuristic schemes, like stochastic gradient ascent and learning rate adaptation. However, these methods were only applied to small synthetic problems, and the authors concluded that the effectiveness of the model on tasks involving large state spaces needs to ....
Bengio, Y. and Frasconi, P. (1996). Input-Output HMMs for Sequence Processing. IEEE Transactions on Neural Networks, 7(5).
....connected by edges, which can be described in the more general framework of probabilistic graphical models [5] The forward backward algorithm for training HMMs [10] shows striking resemblance to the backpropagation algorithm for NNs. Finally, many applications, especially in signal processing [2] and bioinformatics [1] employ hybrid schemes which combine HMMs and NNs. This article focuses on a problem common to both models. For sparse data, the classical training algorithms (mentioned above) which derive from a maximum likelihood (ML) approach, are sub optimal due to overfitting. ....
Y. Bengio and P. Frasconi. Input-Output HMMs for Sequence Processing. IEEE Transactions on Neural Networks, 7(5), 1996.
....successfully applied to problems in speech and OCR for example. It might be argued that the HMMs tend to employ less distributed representations than RNNs, but even if this is true, is it of practical significance There has been some interesting work exploring the links between HMMs and RNNs [22, 18]. Also related to the discussion is the Syntactic Neural Network (SNN) an architecture I developed in my PhD thesis [120, 118] The SNN is a modular architecture that is able to parse and (in some cases) infer context free (and therefore also regular, linear etc. grammars. The architecture ....
Y. Bengio and P. Frasconi. Input-output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7:1231 -- 1249, (1996).
....the likelihood of this model. Cacciatore and Nowlan show that this model is capable of controlling plants which contain switching or jump effects more effectively than a standard mixtures of experts model. 2.4. REVIEW OF RELATED LITERATURE 30 The input output HMM (IOHMM) of Bengio and Frasconi [10] is a further generalisation of the mixtures of controllers model. The IOHMM consists of I experts which predict the target y (n) given the current input x (n) and whose outputs are gated by the outputs of a gating network. The difference with the standard mixtures of experts model lies in ....
....mixture of experts. The aim of this architecture is to factor learning into a temporal component, which is learnt by the gate and a static component, which is learnt by the experts. Modularity is encouraged in both components by the use of the mixtures of experts framework. Bengio and Frasconi [10] described an application of the EM algorithm to learn parameters of IOHMM models. This has intuitive connections with both the EM algorithm for mixtures of experts models and the Baum Welch algorithm [8] of hidden Markov models. Other applications of recurrent mixtures of experts models are in ....
[Article contains additional citation context not shown here]
Bengio, Y. and Frasconi, P. [1996], `Input-output HMMs for sequence processing', IEEE Transactions on Neural Networks 7(5), 1231--1249.
....train than recurrent networks, and are less sensitive to initial conditions. Also, recurrent networks are susceptible to the long term dependency problem when a gradient descent based training algorithm is used [21] though we note that certain recent results somewhat alleviate this problem [22] [23]. The approximation results on two stage networks are important, because when attempting to model an unknown system, often only a general knowledge of the system s characteristics (causal, time invariant, etc. is available. Based upon these characteristics, one must choose a structure that is ....
Y. Bengio and P. Frasconi. Input-output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231--1248, September 1996.
....to many problems, the methods used to train them often suffer from stability problems and sensitivity to initial conditions. They also suffer from fundamental limitations when trained by a gradient descent algorithm [3] though some promising alternatives may alleviate such problems [14] [2]. The second direction of research involves two stage structures consisting of a memory stage, followed by a static neural network (See Figure 1) Several theoretical results have been derived about the capability 2 of these structures, and their stability is more easily ascertained [20] 21] ....
....Set 1 Class A Class B Class C Class D Class E Class F x 0 [1] 7.5 7.5 15.0 7.5 15.0 15.0 t 0 [1] 7.5 7.5 7.5 7.5 7.5 7.5 [1] 0.785 0.785 0.0 0.785 0.0 0.0 h x [1] 1.0 1.0 1.0 1.0 1.0 1.0 h t [1] 1.0 1.0 1.0 1.0 1.0 1.0 x [1] 2.0 2.0 2.0 2.0 2.0 2.0 t [1] 7.6 7.6 6.6 7.6 6.6 6. 6 x 0 [2] 15.0 15.0 15.0 15.0 15.0 15.0 t 0 [2] 27.5 27.5 27.5 27.5 27.5 27.5 [2] 0.0 0.0 0.0 0.0 0.0 0.0 h x [2] 1.0 1.0 1.0 1.0 1.0 1.0 h t [2] 1.0 1.0 1.0 1.0 1.0 1.0 x [2] 2.0 2.0 2.0 2.0 2.0 2.0 t [2] 9.2 9.2 9.2 9.2 9.2 9.2 x 0 [3] 7.5 7.5 22.5 22.5 t 0 [3] 27.5 27.5 27.5 27.5 [3] 0.0 ....
[Article contains additional citation context not shown here]
Y. Bengio and P. Frasconi. Input-output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231--1248, September 1996.
No context found.
Y. Bengio and P. Frasconi, "Input-Output HMMs for sequence processing," in IEEE Trans. Neural Networks,7:5, pp. 1231-1249, September 1996.
No context found.
Bengio, Y., and Frasconi, P. 1996. Input/Output HMMs for sequence processing. IEEE Transactions on Neural Networks 7(5):1231-1249.
No context found.
Y. Bengio and P. Frasconi. Input-output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231--1249, September 1996.
No context found.
Bengio, Y., and Frasconi, P. 1996. Input-output HMMs for sequence processing. IEEE Trans. on Neural Networks 7(5):1231-- 1249.
No context found.
Bengio, Y., and Frasconi, P. (1996). Input-Output HMMs for Sequence Processing. In IEEE Trans. Neural Networks, 7:5, pp. 1231-1249.
No context found.
Y Bengio and P Frasconi. Input-output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231 -- 1249, September 1996.
No context found.
Y Bengio and P Frasconi. Input-output hmms for sequence processing. IEEE Transactions on Neural Networks, 7:1231 -- 1249, (1996).
No context found.
Y. Bengio and P. Frasconi, "Input-Output HMMs for sequence processing," IEEE Trans. Neural Networks, vol. 7, no. 5, pp. 1231--1249, September 1996.
No context found.
Bengio, Y. and Frasconi, P. [1996], `Input-output HMMs for sequence processing', IEEE Transactions on Neural Networks 7(5), 1231--1249.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC