30 citations found. Retrieving documents...
F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451--2471, 2000.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Reinforcement Learning with LSTM in Non-Markovian Tasks with.. - Bakker   (Correct)

.... long term dependencies in timeseries data (Bengio, Simard, Frasconi, 1994) This paper proposes the use of one particular solution from the neural networks literature that has worked well in supervised timeseries learning tasks: Long Short Term Memory (LSTM) Hochreiter Schmidhuber, 1997; Gers, Schmidhuber, Cummins, 2000). In this paper an LSTM recurrent neural network is used in conjunction with model free RL, in the same spirit as the model free recurrent neural network approaches described above (Lin Mitchell, 1993; Bakker van der Voort van der Kleij, 2000) 1.3 Outline The next section describes LSTM. ....

....results on non Markovian RL tasks with long term dependencies. Section 5, finally, presents the general conclusions. 2 LSTM 2. 1 Memory cells LSTM is a recently proposed recurrent neural network architecture, originally designed for supervised timeseries learning (Hochreiter Schmidhuber, 1997; Gers et al. 2000). It is based on an analysis of the problems that conventional recurrent neural network learning algorithms (e.g. backpropagation through time and real time recurrent learning) have when learning timeseries with long term dependencies. These problems boil down to the problem that errors propagated ....

[Article contains additional citation context not shown here]

Gers, F., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12 (10), 2451--2471.


Evolino for Recurrent Support Vector Machines - Schmidhuber, al. (2005)   Self-citation (Schmidhuber)   (Correct)

No context found.

F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451--2471, 2000.


The Work of Schmidhuber 1987-2002 - Hufnagel (2002)   Self-citation (Schmidhuber)   (Correct)

No context found.

F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451-2471, 2000.


The Work of Schmidhuber 1987-2002 - Hufnagel (2002)   Self-citation (Schmidhuber)   (Correct)

No context found.

F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. In Proc. ICANN'99, Int. Conf. on Arti cial Neural Networks, pages 850-855, Edinburgh, Scotland, 1999. IEE, London.


The Work of Schmidhuber 1987-2002 - Hufnagel (2002)   Self-citation (Schmidhuber)   (Correct)

No context found.

F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. Technical Report IDSIA-01-99, IDSIA, Lugano, Lugano, CH, 1999.


Finding Temporal Structure in Music: Blues Improvisation.. - Eck, Schmidhuber (2002)   Self-citation (Schmidhuber)   (Correct)

No context found.

F. A. Gers, J. Schmidhuber and F. Cummins, "Learning to Forget: Continual Prediction with LSTM," Neural Computation, vol. 12, no. 10, pp. 2451-- 2471, 2000.


Learning the Long-Term Structure of the Blues - Eck, Schmidhuber (2002)   Self-citation (Schmidhuber)   (Correct)

No context found.

F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451--2471, 2000.


Improving Long-Term Online Prediction with.. - Pérez-Ortiz.. (2002)   Self-citation (Gers Schmidhuber)   (Correct)

....However, RNNs in general [1,7,9] are hampered by vanishing gradients [5] that make networks unable to deal correctly with long term dependencies. A recent novel RNN called Long Short Term Memory (LSTM) 6] overcomes this problem and learns previously unlearnable solutions to numerous tasks [6,2,3], including tasks that require to store relevant events for more than 1000 subsequent discrete time steps without the help of any short training sequences. In this study we use LSTM with forget gates [2] to predict subsequent symbols of a continual input stream (not segmented a priori into ....

....[6] overcomes this problem and learns previously unlearnable solutions to numerous tasks [6,2,3] including tasks that require to store relevant events for more than 1000 subsequent discrete time steps without the help of any short training sequences. In this study we use LSTM with forget gates [2] to predict subsequent symbols of a continual input stream (not segmented a priori into subsequences with Work supported by the Generalitat Valenciana through grant FPI 99 14 268, by the Spanish Comisi on Interministerial de Ciencia y Tecnolog a through grant TIC20001599 C02 02, and by the ....

[Article contains additional citation context not shown here]

Gers, F. A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Computation 12, 10 (2000) 2451-2471.


Learning Context Sensitive Languages with LSTM.. - Gers.. (2002)   Self-citation (Gers Schmidhuber)   (Correct)

....grant 2100 49 144.96, Spanish Comisi on Interministerial de Ciencia y Tecnolog a grant TIC2000 1599 C02 02, and Generalitat Valenciana grant FPI 99 14 268. 2 Architecture and learning algorithms LSTM Model. Unfortunately, lack of space prohibits a complete and selfcontained description of LSTM [4,3,6]. In what follows, we will limit ourselves to a brief overview. The basic unit of an LSTM network is the memory block containing one or more memory cells and three adaptive, multiplicative gating units shared by all cells in the block. Each memory cell has at its core a recurrently self connected ....

....the general rule of the context sensitive language a and to generalize correctly for all sequences up to a and beyond. We also veri ed that DEKF LSTM is not outperformed by LSTM on other traditional benchmarks involving continuous data, where LSTM outperformed traditional RNNs [6,3]. So DEKF LSTM is not overspecialized on CSLs but represents a general advance. The update complexity per training example, however, is worse than LSTM s, which is local in time and space. ....

Gers, F. A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Computation 12, 10 (2000) 2451-2471.


Dekf-Lstm - Gers, Pérez-Ortiz, Eck (2002)   Self-citation (Gers Schmidhuber)   (Correct)

....Section 3) We apply both algorithms to the only CSL ever tried with RNNs, namely, a . Section 4 presents experiments; the results are discussed in Section 5. 2 LSTM overview Unfortunately, lack of space prohibits a complete and self contained description of LSTM. We refer the reader to [3, 4] for details. In what follows, we will limit ourselves to a brief overview. The basic unit of an LSTM network is the memory block containing one or more memory cells and three adaptive, multiplicative gating units shared by all cells in the block. Each memory cell has at its core a recurrently ....

....gated input to the state at the previous time step s c (t 1) which is multiplied by the forget gate activation. The cell output y is calculated by multiplying (gating) s c (t) by the output gate activation. Gradient based backward pass. Essentially, LSTM s backward pass (for details see [6, 4]) is an ecient fusion of slightly modi ed, truncated BPTT and a customized version of RTRL. We are using iterative gradient descent, minimizing an objective function E(t) here the usual mean squared error function. Unlike BPTT and RTRL, LSTM s learning algorithm is local in space and time: the ....

[Article contains additional citation context not shown here]

F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451-2471, 2000.


Improving Long-Term Online Prediction with.. - Pérez-Ortiz.. (2002)   Self-citation (Gers Schmidhuber)   (Correct)

....However, RNNs in general [1,7,9] are hampered by vanishing gradients [5] that make networks unable to deal correctly with long term dependencies. A recent novel RNN called Long Short Term Memory (LSTM) 6] overcomes this problem and learns previously unlearnable solutions to numerous tasks [6,2,3], including tasks that require storing relevant events for more than 1000 subsequent discrete time steps without the help of any short training sequences. In this study we use LSTM with forget gates [2] to predict subsequent symbols of a continual input stream (not segmented a priori into ....

....[6] overcomes this problem and learns previously unlearnable solutions to numerous tasks [6,2,3] including tasks that require storing relevant events for more than 1000 subsequent discrete time steps without the help of any short training sequences. In this study we use LSTM with forget gates [2] to predict subsequent symbols of a continual input stream (not segmented a priori into subsequences with Work supported by the Generalitat Valenciana through grant FPI 99 14 268, by the Spanish Comisi on Interministerial de Ciencia y Tecnolog a through grant TIC20001599 C02 02, and by the ....

[Article contains additional citation context not shown here]

Gers, F. A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Computation 12, 10 (2000) 2451-2471.


Learning Context Sensitive Languages with LSTM.. - Gers.. (2002)   Self-citation (Gers Schmidhuber)   (Correct)

....namely, a . Work supported by SNF grant 2100 49 144.96, Spanish Comisi on Interministerial de Ciencia y Tecnolog a grant TIC2000 1599 C02 02, and Generalitat Valenciana grant FPI 99 14 268. 2 Architecture and learning algorithms Lack of space prohibits a complete description of LSTM [3,6]. In what follows, we will limit ourselves to a brief overview. The basic unit of an LSTM network is the memory block containing one or more memory cells and three adaptive, multiplicative gating units shared by all cells in the block. Each memory cell has at its core a recurrently self connected ....

....than a to extract the general rule of the CSL a and to generalize correctly for all sequences up to n = 1000 and beyond. We also veri ed that DEKF LSTM is not outperformed by LSTM on other traditional benchmarks involving continuous data, where LSTM outperformed traditional RNNs [6,3]. This indicates that DEKF LSTM is not over specialized on CSLs but represents a general advance. The update complexity per training example, however, is worse than LSTM s. ....

Gers, F. A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Computation 12, 10 (2000) 2451-2471.


Improving Long-Term Online Prediction with.. - Pérez-Ortiz.. (2002)   Self-citation (Gers Schmidhuber)   (Correct)

....However, RNNs in general [1,7,9] are hampered by vanishing gradients [5] that make networks unable to deal correctly with long term dependencies. A recent novel RNN called Long Short Term Memory (LSTM) 6] overcomes this problem and learns previously unlearnable solutions to numerous tasks [6,2,3], including tasks that require storing relevant events for more than 1000 subsequent discrete time steps without the help of any short training sequences. In this study we use LSTM with forget gates [2] to predict subsequent symbols of a continual input stream (not segmented a priori into ....

....[6] overcomes this problem and learns previously unlearnable solutions to numerous tasks [6,2,3] including tasks that require storing relevant events for more than 1000 subsequent discrete time steps without the help of any short training sequences. In this study we use LSTM with forget gates [2] to predict subsequent symbols of a continual input stream (not segmented a priori into subsequences with Work supported by the Generalitat Valenciana through grant FPI 99 14 268, by the Spanish Comision Interministerial de Ciencia y Tecnologa through grant TIC20001599 C02 02, and by the Swiss ....

[Article contains additional citation context not shown here]

Gers, F. A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Computation 12, 10 (2000) 2451--2471.


Learning Context Sensitive Languages with LSTM.. - Gers.. (2002)   Self-citation (Gers Schmidhuber)   (Correct)

....namely, a . Work supported by SNF grant 2100 49 144.96, Spanish Comision Interministerial de Ciencia y Tecnologa grant TIC2000 1599 C02 02, and Generalitat Valenciana grant FPI 99 14 268. 2 Architecture and learning algorithms Lack of space prohibits a complete description of LSTM [3,6]. In what follows, we will limit ourselves to a brief overview. The basic unit of an LSTM network is the memory block containing one or more memory cells and three adaptive, multiplicative gating units shared by all cells in the block. Each memory cell has at its core a recurrently self connected ....

....than a to extract the general rule of the CSL a and to generalize correctly for all sequences up to n = 1000 and beyond. We also verified that DEKF LSTM is not outperformed by LSTM on other traditional benchmarks involving continuous data, where LSTM outperformed traditional RNNs [6,3]. This indicates that DEKF LSTM is not over specialized on CSLs but represents a general advance. The update complexity per training example, however, is worse than LSTM s. ....

Gers, F. A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Computation 12, 10 (2000) 2451--2471.


Unsupervised Learning in LSTM Recurrent Neural Networks - Klapper-Rybicka..   Self-citation (Schmidhuber)   (Correct)

....the algorithms used to train it. Compared to FFNs, traditional RNNs [3 5] are notoriously difficult to train, especially when the interval between relevant events in the input sequence exceeds about 10 time steps [6 8] Recent progress in RNN research, however, has overcome some of these problems [8, 9], and may pave the way for a fresh look at unsupervised sequence learning. Here, for the first time, we will plug certain information theoretic objectives into a recent RNN architecture called Long Short Term Memory (LSTM) which dramatically outperforms other RNNs on a wide variety of supervised ....

.... 2 Unsupervised training of LSTM Long Short Term Memory (LSTM) is a novel efficient type of recurrent neural network architecture [8] whose advantages over other RNN models have been demonstrated in various areas: learning regular and context free languages [13] predicting continual time series [9], motor control and rhythm detection [14] While previous work trained LSTM networks on explicitly defined targets, here we will show that LSTM can also handle unsupervised problems. We used two unsupervised learning algorithms, Binary Information Gain Optimization (BINGO) and Nonparametric ....

[Article contains additional citation context not shown here]

F. A. Gets, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with LSTM," Neural Computation, vol. 12, no. 10, pp. 2451-2471, 2000.


Unsupervised Learning in LSTM Recurrent Neural Networks - Klapper-Rybicka.. (2001)   Self-citation (Schmidhuber)   (Correct)

....and the algorithms used to train it. Compared to FFNs, traditional RNNs [3 5] are notoriously dicult to train, especially when the interval between relevant events in the input sequence exceeds about 10 time steps [6 8] Recent progress in RNN research, however, has overcome some of these problems [8, 9], and may pave the way for a fresh look at unsupervised sequence learning. Here, for the rst time, we will plug certain information theoretic objectives into a recent RNN architecture called Long Short Term Memory (LSTM) which dramatically outperforms other RNNs on a wide variety of supervised ....

.... 2 Unsupervised training of LSTM Long Short Term Memory (LSTM) is a novel ecient type of recurrent neural network architecture [8] whose advantages over other RNN models have been demonstrated in various areas: learning regular and context free languages [13] predicting continual time series [9], motor control and rhythm detection [14] While previous work trained LSTM networks on explicitly de ned targets, here we will show that LSTM can also handle unsupervised problems. We used two unsupervised learning algorithms, Binary Information Gain Optimization (BINGO) and Nonparametric ....

[Article contains additional citation context not shown here]

F. A. Gers, J. Schmidhuber, and F. Cummins, \Learning to forget: Continual prediction with LSTM," Neural Computation, vol. 12, no. 10, pp. 2451-2471, 2000.


Gradient Flow in Recurrent Nets: the Difficulty of Learning .. - Hochreiter, Bengio   Self-citation (Schmidhuber)   (Correct)

....Models (HMMs, the most successful technique in several sequence processing applications see [3] for a review) they are not limited to discrete internal states but allow for continuous, distributed sequence representations. Hence they can solve tasks no other current method can solve (e.g. [10]) The problem of vanishing gradients, however, makes conventional RNNs hard to train. We suspect this is why feedforward neural networks outnumber RNNs in terms of successful real world applications. Some of the remedies outlined in this chapter may lead to more effective learning systems. ....

F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with lstm. In Proc. ICANN'99, Int. Conf. on Artificial Neural Networks, pages 850--855, Edinburgh, Scotland, 1999. IEE, London.


Unsupervised Learning in Recurrent Neural Networks - Klapper-Rybicka..   Self-citation (Schmidhuber)   (Correct)

....the algorithms used to train it. Compared to FFNs, traditional RNNs [41 43] are notoriously dicult to train, especially when the interval between relevant events in the input sequence exceeds about 10 time steps [44 46] Recent progress in RNN research, however, has overcome some of these problems [46, 47], and may pave the way for a fresh look at unsupervised sequence learning. Here, for the rst time, we will plug certain information theoretic objectives into a recent RNN architecture called Long Short Term Memory (LSTM) which dramatically outperforms other RNNs on a wide variety of supervised ....

.... Unsupervised training of LSTM Long Short Term Memory (LSTM) is a novel ecient type of recurrent neural network (RNN) architecture [46] whose advantages over other RNN models have been demonstrated in various areas: learning regular and context free languages [51] predicting continual time series [47], motor control and rhythm detection [52] While previous work explored LSTM training based on clearly de ned objective targets, in this paper we will show that LSTM can also handle unsupervised problems. We used two unsupervised learning algorithms, Binary Information Gain Optimization (BINGO) ....

[Article contains additional citation context not shown here]

F. A. Gers, J. Schmidhuber, and F. Cummins, \Learning to forget: Continual prediction with LSTM," Neural Computation, vol. 12, no. 10, pp. 2451-2471, 2000.


Learning Precise Timing with LSTM Recurrent Networks - Gers, Schmidhuber, Schraudolph (2002)   (2 citations)  Self-citation (Gers Schmidhuber)   (Correct)

.... some of our previous tasks required the LSTM network to act upon events that occurred 50 discrete time steps ago, independently of what happened over the intervening 49 steps (such tasks are not learnable by traditional RNNs, which su er from long time lag problems (Hochreiter Schmidhuber, 1997; Gers, Schmidhuber, Cummins, 2000)) Right before the critical moment, however, there was a helpful marker input informing the network that its next action would be crucial. Thus the network did not really have to learn to measure a time interval of 50 steps; it just had to learn to store relevant information for 50 steps, and ....

....in its connection scheme, and introduces peephole connections as a remedy. Sections 3 and 4 describe the modi ed forward and backward pass for peephole LSTM. Section 5 describes our new timing experiments. 2 Extending LSTM with Peephole Connections We are building on LSTM with forget gates (Gers et al. 2000), simply called LSTM in what follows. The basic unit of an LSTM network is the memory block containing one or more memory cells and three adaptive, multiplicative gating units shared by all cells in the block. Memory blocks allow cells to share the same gates (provided the task permits this) ....

[Article contains additional citation context not shown here]

Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM.


Applying LSTM to Time Series Predictable Through.. - Gers, Eck, Schmidhuber (2001)   (2 citations)  Self-citation (Gers Schmidhuber)   (Correct)

No context found.

F. A. Gers, J. Schmidhuber, and F. Cummins, \Learning to forget: Continual prediction with LSTM," Neural Computation, vol. 12, no. 10, pp. 2451-2471, 2000.


Long Short-Term Memory Learns Context Free and Context.. - Gers, Schmidhuber (2001)   Self-citation (Gers Schmidhuber)   (Correct)

....learning (RTRL) and back propagation through time (BPTT) it is local in space and time, with computational complexity O(1) per time step and weight. Previous work showed that LSTM outperforms traditional RNN algorithms on numerous tasks involving real valued or discrete inputs and targets [2, 3], including tasks that require to learn the rules of regular languages (RLs) describable by deterministic nite state automata (DFA) Until now, however, it has remained unclear whether LSTM s superiority carries over to tasks involving context free languages (CFLs) such as those discussed in the ....

F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451-2471, 2000.


Decoupled Extended Kalman Filters Improve Long-Term .. - Pérez-Ortiz..   Self-citation (Felix Urgen)   (Correct)

....are hampered by [2, 11, 17, 14, 9] vanishing gradients [7, 1] that make a network unable to deal correctly with longterm dependencies. A recent novel RNN called Long Short Term Memory [8] overcomes this vanishing gradient problem and learns previously unlearnable solutions to numerous tasks [8, 4, 5], including tasks that require to store relevant events for more than 1000 subsequent discrete time steps without the help of any short training sequences. In this study we use LSTM with forget gates [4] to predict subsequent symbols of a continual symbolic input stream (not segmented a priori ....

....vanishing gradient problem and learns previously unlearnable solutions to numerous tasks [8, 4, 5] including tasks that require to store relevant events for more than 1000 subsequent discrete time steps without the help of any short training sequences. In this study we use LSTM with forget gates [4] to predict subsequent symbols of a continual symbolic input stream (not segmented a priori into subsequences with clearly de ned ends) with long term dependencies. Thus, the focus is on true online processing. Gers et al. 4] studied a similar problem; the di erence to their related set up is ....

[Article contains additional citation context not shown here]

Gers, Felix A., Jurgen Schmidhuber and Fred Cummins (2000), \Learning to forget: continual prediction with LSTM", Neural Computation, 12(10), 2451-2471.


Decoupled Extended Kalman Filters Improve Long-Term .. - Pérez-Ortiz..   Self-citation (Felix Urgen)   (Correct)

....the 1000000 th sequence symbol; this is indicated in the table by 1000000 . It should be noted that the average number of symbols required for learning to predict accurately in real time (thousands of symbols) is much smaller than the number of symbols required in Gers et al. s o ine set up [3] (millions) This deserves a more profound study. LSTM with DEKF results. With the DEKF training algorithm, the number of symbols needed for correct prediction is even lower compare Table 2. Although the time required to achieve 1000 error free predictions in a row is generally lower than with ....

Gers, Felix A., Jurgen Schmidhuber and Fred Cummins (1999), \Learning to forget: continual prediction with LSTM", in Proc. Int. Conf. on Articial Neural Networks (ICANN'99), pp. 850-855.


Recurrent Support Vector Machines - Schmidhuber, al. (2005)   (Correct)

No context found.

F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451--2471, 2000.


Local Maximum Ozone Concentration Prediction Using LSTM - Recurrent Neural Networks (2002)   (Correct)

No context found.

F.A. Gers, J. Schmidhuber and F. Cummins. Learning to Forget: Continual Prediction with LSTM. Neural Computation 12 (10), pp.2451-2471, 2000.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC