Results 1  10
of
18
LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages
 IEEE Transactions on Neural Networks
, 2001
"... Previous work on learning regular languages from exemplary training sequences showed that Long Short Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural net ..."
Abstract

Cited by 67 (21 self)
 Add to MetaCart
Previous work on learning regular languages from exemplary training sequences showed that Long Short Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks (RNNs), and show that it works even better than previous hardwired or highly specialized architectures.
A Recurrent Network that performs a ContextSensitive Prediction Task
, 1996
"... We address the problem of processing a contextsensitive language with a recurrent neural network (RN). So far, the language processing capabilities of RNs have only been investigated for regular and contextfree languages. We present an extremely simple RN with only one parameter for its two hi ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
We address the problem of processing a contextsensitive language with a recurrent neural network (RN). So far, the language processing capabilities of RNs have only been investigated for regular and contextfree languages. We present an extremely simple RN with only one parameter for its two hidden nodes that can perform a prediction task on sequences of symbols from the language f(ba k ) n j k 0; n ? 0g, a language that is contextsensitive but not contextfree. The input to the RN consists of any string of the language, one symbol at a time. The network should then, at all times, predict the symbol that should follow. This means that the network must be able to count the number of a's in the first subsequence and to retain this number for future use. Our network can solve the task for k = 1 up to k = 120. The network represents the count of a's in the subsequence by having different limit cycles for every different number of a's counted. The limit cycles are relate...
Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets
, 2002
"... The Long ShortTerm Memory (LSTM) network trained by gradient descent solves difficult problems which traditional recurrent neural networks in general cannot. We have recently observed that the decoupled extended Kalman filter training algorithm allows for even better performance, reducing significa ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
The Long ShortTerm Memory (LSTM) network trained by gradient descent solves difficult problems which traditional recurrent neural networks in general cannot. We have recently observed that the decoupled extended Kalman filter training algorithm allows for even better performance, reducing significantly the number of training steps when compared to the original gradient descent training algorithm. In this paper we present a set of experiments which are unsolvable by classical recurrent networks but which are solved elegantly and robustly and quickly by LSTM combined with Kalman filters.
Long ShortTerm Memory in Recurrent Neural Networks
, 2001
"... For a long time, recurrent neural networks (RNNs) were thought to be theoretically fascinating. Unlike standard feedforward networks RNNs can deal with arbitrary input sequences instead of static input data only. This combined with the ability to memorize relevant events over time makes recurrent n ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
For a long time, recurrent neural networks (RNNs) were thought to be theoretically fascinating. Unlike standard feedforward networks RNNs can deal with arbitrary input sequences instead of static input data only. This combined with the ability to memorize relevant events over time makes recurrent networks in principal more powerful than standard feedforward networks. The set of potential applications is enormous: any task that requires to learn how to use memory is a potential task for recurrent networks. Potential application areas include time series prediction, motor control in nonMarkovian environments and rhythm detection (in music and speech). Previous successes in real world applications, with recurrent networks were limited, however, due to practical problems when long time lags between relevant events make learning dicult. For these applications conventional gradientbased recurrent network algorithms for learning to store information over extended time intervals take too long. The main reason for this failure is the rapid decay of backpropagated error. The \Long Short Term Memory" (LSTM) algorithm overcomes this and related problems by enforcing constant error ow. Using gradient descent, LSTM explicitly learns when to store information and when to access it. In this thesis we extend, analyze, and apply the LSTM algorithm. In particular, we identify two weaknesses of LSTM, oer solutions and modify the algorithm accordingly: (1) We recognize a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indenitely and eventually cause the network to break down. Our remedy is a novel, adaptiv...
Natural Language Grammatical Inference: A Comparison of Recurrent Neural Networks and Machine Learning Methods
 Symbolic, Connectionist, and Statistical Approaches to Learning for Natural Language Processing, Lecture notes in AI
, 1996
"... We consider the task of training a neural network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government and Binding theory. We investigate the foll ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We consider the task of training a neural network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government and Binding theory. We investigate the following models: feedforward neural networks, FrasconiGoriSoda and BackTsoi locally recurrent neural networks, Williams and Zipser and Elman recurrent neural networks, Euclidean and editdistance nearestneighbors, and decision trees. Nonneural network machine learning methods are included primarily for comparison. We find that the Elman and Williams & Zipser recurrent neural networks are able to find a representation for the grammar which we believe is more parsimonious. These models exhibit the best performance. 1 Motivation 1.1 Representational Power of Recurrent Neural Networks Natural language has traditionally been handled using symbolic computation and recursive processes. The most ...
Representation Beyond Finite States: Alternatives to PushDown Automata
 IN: KOLEN AND KREMER
, 2001
"... It has been well established that Dynamical Recurrent Networks (DRNs) can act as deterministic finitestate automata (DFAs  see Chapters 6 and 7). A DRN can reliably represent the states of a DFA as regions in its state space, and the DFA transitions as transitions between these regions. Howeve ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
It has been well established that Dynamical Recurrent Networks (DRNs) can act as deterministic finitestate automata (DFAs  see Chapters 6 and 7). A DRN can reliably represent the states of a DFA as regions in its state space, and the DFA transitions as transitions between these regions. However, as we shall see in this chapter, DRNs can learn to process languages which are nonregular (and therefore cannot be processed by any DFA). Moreover, DRNs are capable of generalizing in ways which go beyond the DFA framework. We will show how DRNs can learn to predict contextfree and contextsensitive languages, making use of the transient dynamics as the network activations move towards an attractor or away from a repeller. The resulting trajectory can be thought of as analogous to winding up a spring in one dimension and unwinding it in another. In contrast to pushdown automata, which rely on unbounded external memory, DRNs must instead rely on arbi
New millennium AI and the convergence of history
 Challenges to Computational Intelligence
, 2007
"... Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At the same time there has been rapid progress in practical methods for learning true sequenceprocessing programs, as opposed to traditional methods limited to stationary pattern association. Here we will briefly review some of the new results, and speculate about future developments, pointing out that the time intervals between the most notable events in over 40,000 years or 2 9 lifetimes of human history have sped up exponentially, apparently converging to zero within the next few decades. Or is this impression just a byproduct of the way humans allocate memory space to past events? 1
Long ShortTerm Memory Learns Context Free and Context Sensitive Languages
 Proceedings of the ICANNGA 2001 Conference
, 2001
"... Previous work on learning regular languages from exemplary training sequences showed that Long ShortTerm Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks, and show that it works even ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Previous work on learning regular languages from exemplary training sequences showed that Long ShortTerm Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks, and show that it works even better than previous hardwired or highly specialized architectures. To the best of our knowledge, LSTM variants are also the rst RNNs to learn a context sensitive language (CSL), namely, a
Grammatical Inference of Colonies
"... A concept of accepting colonies is introduced. A hybrid connectionistsymbolic architecture ("neural pushdown automaton") for inference of colonies based on presentation of positive and negative examples of strings is then described, together with an algorithm for extracting a colony from t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
A concept of accepting colonies is introduced. A hybrid connectionistsymbolic architecture ("neural pushdown automaton") for inference of colonies based on presentation of positive and negative examples of strings is then described, together with an algorithm for extracting a colony from trained neural network. Some examples of the inference of colonies generating /accepting simple contextfree languages illustrate the function of the architecture. Keywords: grammar system, artificial neural network, pushdown automaton, colony, grammatical inference. 1 Introduction The problem of grammatical inference is generally hard and even for regular languages it is NP in the worst cases. There have been various heuristic methods developed, trying to find a suitable solution with reasonable computational expenses. We will focus our attention on hybrid architectures coupling principles of neural and symbolic computation. There have been many such architectures presented, concerning mostly (but no...
DekfLstm
 IN PROC. 10TH EUROPEAN SYMPOSIUM ON ARTIFICAL NEURAL NETWORKS, ESANN
, 2002
"... Unlike traditional recurrent neural networks, the long shortterm memory (LSTM) model generalizes well when presented with training sequences derived from regular and also simple nonregular languages. Our novel combination of LSTM and the decoupled extended Kalman filter, however, learns even faster ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Unlike traditional recurrent neural networks, the long shortterm memory (LSTM) model generalizes well when presented with training sequences derived from regular and also simple nonregular languages. Our novel combination of LSTM and the decoupled extended Kalman filter, however, learns even faster and generalizes even better, requiring only the 10 shortest exemplars (n<=10) of the context sensitive language a to deal correctly with values of n up to 1000 and more. Even when we consider the relatively high update complexity per timestep, in many cases the hybrid offers faster learning than LSTM by itself.