Abstract:
Standard recurrent nets cannot deal with long minimal time lags between relevant signals. Several recent NIPS papers propose alternative methods. We first show: problems used to promote various previous algorithms can be solved more quickly by random weight guessing than by the proposed algorithms. We then use LSTM, our own recent algorithm, to solve a hard problem that can neither be quickly solved by random search nor by any other recurrent net algorithm we are aware of. 1 TRIVIAL PREVIOUS LONG TIME LAG PROBLEMS Traditional recurrent nets fail in case of long minimal time lags between input signals and corresponding error signals [7, 3]. Many recent papers propose alternative methods, e.g., [16, 12, 1, 5, 9]. For instance, Bengio et al. investigate methods such as simulated annealing, multi-grid random search, time-weighted pseudo-Newton optimization, and discrete error propagation [3]. They also propose an EM approach
Citations
|
201
|
Learning long-term dependencies with gradient descent is difficult
– Bengio
- 1994
|
|
193
|
The induction of dynamical recognizers
– Pollack
- 1991
|
|
127
|
Long short-term memory
– Hochreiter, Schmidhuber
- 1997
|
|
109
|
Gradient Calculations for Dynamic Recurrent Neural Networks: A Survey
– Pearlmutter
- 1995
|
|
101
|
An e cient gradient-based algorithm for on-line training of recurrent network trajectories
– Williams, Peng
- 1990
|
|
91
|
An input/output HMM architecture
– Bengio, Frasconi
- 1996
|
|
82
|
The utility driven dynamic error propagation network
– Robinson, Fallside
- 1987
|
|
62
|
The induction of multiscale temporal structure
– Mozer
- 1992
|
|
55
|
Learning complex, extended sequences using the principle of history compression
– Schmidhuber
- 1992
|
|
40
|
Untersuchungen zu dynamischen neuronalen Netzen
– Hochreiter
- 1991
|
|
33
|
Credit assignment through time: Alternatives to backpropagation
– Bengio, Frasconi
- 1994
|
|
30
|
The cascade-correlation learning algorithm
– Fahlman, Lebiere
- 1990
|
|
29
|
Experimental comparison of the effect of order in recurrent neural networks
– Miller, Giles
- 1993
|
|
26
|
Finite-state automata and simple recurrent networks
– Cleeremans, Servan-Schreiber, et al.
- 1989
|
|
24
|
Learning sequential structures with the real-time recurrent learning algorithm
– Smith, Zipser
- 1989
|
|
21
|
Induction of Finite-State Automata Using Second-Order Recurrent Networks
– Watrous, Kuhn
- 1992
|
|
20
|
Hierarchical Recurrent Neural Networks for Long-Term Dependencies
– Hihi, Bengio
- 1996
|
|
17
|
Learning long-term dependencies is not as difficult with NARX recurrent neural networks
– Lin, Horne, et al.
- 1995
|
|
13
|
First-order recurrent neural networks and deterministic finite state automata
– Manolios, Fanelli
- 1994
|
|
13
|
Dynamic construction of finite automata from examples using hill-climbing
– TOMITA
- 1982
|