10 citations found. Retrieving documents...
Bengio Y, Simard P. and Frasconi P. (1994) Learning long-term dependencies with gradient is difficult. IEEE Transactions on Neural Networks 5, 157--166.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Building Predictive Models on Complex - Symbolic Sequences Via   (Correct)

....rather deep memory structures. Surprisingly, the BCM based model (RBCMNs are trained in an unsupervised mode) has a comparable performance to its RNN based counterpart. This can be explained by the familiar information latching problem in recurrent networks when longer time spans are to be latched [3, 4]. We argue that BCM based models correspond to variable memory length Markov models. 1 Introduction In our recent study of hybrid symbolic neural time series modeling [1] we trained the second order RNNs on complex symbolic sequences and then extracted finite state neural based predictive ....

....organization. In particular, we use a recurrent version of the BCM with lateral inhibition [2] to implement the transformation of histories of seen symbols into activations in the recurrent layer. The study was initiated with an intuition that, due to the familiar information latching problem [4], the corresponding transformations in classical RNN may not be optimal for sequences exhibiting a deep memory structure. Non recurrent BCM networks with the so called feed forward lateral inhibition are known to perform projection pursuit of the input space into the space of BCM activations [5, ....

[Article contains additional citation context not shown here]

Bengio Y, Simard P. and Frasconi P. (1994) Learning long-term dependencies with gradient is difficult. IEEE Transactions on Neural Networks 5, 157--166.


Building predictive models on complex symbolic.. - Tino.. (2000)   (Correct)

....symbolic sequences with rather deep memory structures. Surprisingly, the BCM based model has a comparable or better performance than its RNN based counterpart. This can be explained by the familiar information latching problem in recurrent networks when longer time spans are to be latched [3, 4] 1 Introduction In our recent study of hybrid symbolic neural time series modeling [1] we trained the second order RNNs on complex symbolic sequences amd then extracted finite state neural based predictive models by using only recurrent part of the trained networks and discarding the ....

....unpredictable events requiring a deeper memory. The laser sequence (obtained from http: www.cs. colorado.edu andreas Time Series SantaFe.html) of 10 4 differences Delta t between successive activations was quantized into a symbolic sequence S = fs t g over the alphabet A = fa; b; c; dg, with A = 4 symbols. The symbols correspond to low and high positive negative activity changes such that s t j 8 : a (normal up) if 0 Delta t ae 2 b (high up) if ae 2 Delta t c (normal down) if ae 1 Delta t 0 d (deep down) if Delta t ae 1 (10) where the parameters ae 1 and ....

[Article contains additional citation context not shown here]

Bengio Y, Simard P. and Frasconi P. (1994) Learning long-term dependencies with gradient is difficult. IEEE Transactions on Neural Networks 5, 157--166.


Building predictive models on complex symbolic.. - Tino..   (Correct)

....rather deep memory structures. Surprisingly, the BCM based model (RBCMNs are trained in an unsupervised mode) has a comparable performance to its RNN based counterpart. This can be explained by the familiar information latching problem in recurrent networks when longer time spans are to be latched [3, 4]. We argue that BCM based models correspond to variable memory length Markov models. 1 Introduction In our recent study of hybrid symbolic neural time series modeling [1] we trained the second order RNNs on complex symbolic sequences and then extracted finite state neural based predictive ....

....organization. In particular, we use a recurrent version of the BCM with lateral inhibition [2] to implement the transformation of histories of seen symbols into activations in the recurrent layer. The study was initiated with an intuition that, due to the familiar information latching problem [4], the corresponding transformations in classical RNN may not be optimal for sequences exhibiting a deep memory structure. Non recurrent BCM networks with the so called feed forward lateral inhibition are known to perform projection pursuit of the input space into the space of BCM activations [5, ....

[Article contains additional citation context not shown here]

Bengio Y, Simard P. and Frasconi P. (1994) Learning long-term dependencies with gradient is difficult. IEEE Transactions on Neural Networks 5, 157--166.


Using a Sequential SOM to Parse Long-term Dependencies - Mayberry, III, Miikkulainen (1999)   (1 citation)  (Correct)

....that can better deal with such long term dependencies. In NARX networks, previous sequence constituents are explicitly represented in a predetermined number of output delays, thus reducing the effects of vanishing gradients, which is the primary source of memory degradation in recurrent networks (Bengio et al. 1994). The performance of these networks is strongly dependent on the number of output delays, and there are no guidelines on how many are needed. This paper describes a method of extending recurrent networks such as the SRN and NARX with SARDNET (James and Miikkulainen, 1995) a self organizing map ....

Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient is difficult. IEEE Transactions on Neural Networks, 5(2):157--166.


Remembering the Past: The Role of Embedded Memory in.. - Giles, Lin, Horne (1997)   (Correct)

....VII, Eds. J. Principe, L. Giles, N. Morgan, E. Wilson. p.34 43. IEEE Press, 1997. Copyright IEEE. and computationally quite powerful [25] can sometimes have difficulty learning even simple temporal behavior. Part of this difficulty has been attributed to the problem of long term dependencies [2, 18], i.e. those problems for which the desired output of a system at time T depends on inputs presented at times t T . In particular Bengio et al. 2] showed that if a system is to latch information robustly, then the fraction of the gradient in a gradient based training algorithm due to ....

....have difficulty learning even simple temporal behavior. Part of this difficulty has been attributed to the problem of long term dependencies [2, 18] i.e. those problems for which the desired output of a system at time T depends on inputs presented at times t T . In particular Bengio et al. [2] showed that if a system is to latch information robustly, then the fraction of the gradient in a gradient based training algorithm due to information n time steps in the past approaches zero as n becomes large. This effect is called the problem of vanishing gradient. Bengio et al. claimed that ....

[Article contains additional citation context not shown here]

Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient is difficult. IEEE Transactions on Neural Networks, 5(2):157--166, 1994.


Learning long-term dependencies in NARX recurrent neural.. - Lin, Horne, Tino, Giles (1996)   (23 citations)  (Correct)

....of a system at time T depends on inputs presented at times t T . This was noted by Mozer who reported that RNNs were able to learn short term musical structure using gradient based methods [21] but had difficulty capturing global behavior. These ideas were recently formalized by Bengio et al. [2], who showed that if a system is to latch information robustly, then the fraction of the gradient due to information n time steps in the past approaches zero as n becomes large. Several approaches have been suggested to circumvent the problem of vanishing gradients. For example, gradient based ....

....of the gradient due to information n time steps in the past approaches zero as n becomes large. Several approaches have been suggested to circumvent the problem of vanishing gradients. For example, gradient based methods can be abandoned completely in favor of alternative optimization methods [2, 25]. However, the algorithms investigated so far either perform just as poorly on problems involving long term dependencies, or, when they are better, require far more computational resources [2] Another possibility is to modify conventional gradient descent by more heavily weighing the fraction of ....

[Article contains additional citation context not shown here]

Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient is difficult. IEEE Transactions on Neural Networks, 5(2):157--166, 1994.


How Embedded Memory in Recurrent Neural Network Architectures.. - Tsungnan Lin (1996)   (3 citations)  (Correct)

....[25] However, various empirical studies suggest that sometimes learning even simple behavior can be quite difficult when using gradient descent learning algorithms. Recently, it has been demonstrated that at least part of this difficulty can be attributed to the problem of long term dependencies [2, 18], i.e. those problems for which the desired output of a system at time T depends on inputs presented at times t T . In particular Bengio et al. 2] showed that if a system is to latch information robustly, then the fraction of the gradient in a gradient based training algorithm due to ....

....algorithms. Recently, it has been demonstrated that at least part of this difficulty can be attributed to the problem of long term dependencies [2, 18] i.e. those problems for which the desired output of a system at time T depends on inputs presented at times t T . In particular Bengio et al. [2] showed that if a system is to latch information robustly, then the fraction of the gradient in a gradient based training algorithm due to information n time steps in the past approaches zero as n becomes large. This effect is called the problem of vanishing gradient. Bengio et al. claimed that ....

[Article contains additional citation context not shown here]

Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient is difficult. IEEE Transactions on Neural Networks, 5(2):157--166, 1994.


Learning long-term dependencies is not as difficult with.. - Lin, Horne, Tino, Giles (1996)   (2 citations)  (Correct)

....output at time T depends on inputs presented at times t T . This was noted by Mozer who reported that RNNs were able to learn short term musical structure using gradient based methods [17] but had difficulty capturing global behavior. These ideas were recently formalized by Bengio et al. [2], who showed that if a system is to robustly latch information, then the fraction of the gradient due to information n time steps in the past approaches zero as n becomes large. Several approaches have been suggested to circumvent the problem of vanishing gradients. For example, gradient based ....

....of the gradient due to information n time steps in the past approaches zero as n becomes large. Several approaches have been suggested to circumvent the problem of vanishing gradients. For example, gradient based methods can be abandoned completely in favor of alternative optimization methods [2, 21]. However, the algorithms investigated so far either perform just as poorly on problems involving long term dependencies, or, when they are better, require far more computational resources [2] Another possibility is to modify conventional gradient descent by more heavily weighing the fraction of ....

[Article contains additional citation context not shown here]

Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient is difficult. IEEE Transactions on Neural Networks, 5(2):157--166, 1994.


How Embedded Memory in Recurrent Neural Network.. - Lin, Horne, Giles (1996)   (3 citations)  (Correct)

....[26] However, various empirical studies suggest that sometimes learning even simple behavior can be quite difficult when using gradient descent learning algorithms. Recently, it has been demonstrated that at least part of this difficulty can be attributed to the problem of long term dependencies [2, 19], i.e. those problems for which the desired output of a system at time T depends on inputs presented at times t T . In particular Bengio et al. 2] showed that if a system is to latch information robustly, then the fraction of the gradient in a gradient based training algorithm due to ....

....algorithms. Recently, it has been demonstrated that at least part of this difficulty can be attributed to the problem of long term dependencies [2, 19] i.e. those problems for which the desired output of a system at time T depends on inputs presented at times t T . In particular Bengio et al. [2] showed that if a system is to latch information robustly, then the fraction of the gradient in a gradient based training algorithm due to information n time steps in the past approaches zero as n becomes large. This effect is called the problem of vanishing gradient. Bengio et al. claimed that ....

[Article contains additional citation context not shown here]

Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient is difficult. IEEE Transactions on Neural Networks, 5(2):157--166, 1994.


Cognitive Linguistics and Connectionist Models of Language.. - Smith   (Correct)

No context found.

Bengio, Y., Simard, P. and Frasconi, P. (1994). Learning long-term dependencies with gradient is dicult. IEEE Transactions on Neural Networks, 5 (2), p157-166.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC