Results 1 - 10
of
25
LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages
- IEEE Transactions on Neural Networks
, 2001
"... Previous work on learning regular languages from exemplary training sequences showed that Long Short- Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks ..."
Abstract
-
Cited by 54 (20 self)
- Add to MetaCart
Previous work on learning regular languages from exemplary training sequences showed that Long Short- Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks (RNNs), and show that it works even better than previous hardwired or highly specialized architectures.
Architectural Bias in Recurrent Neural Networks - Fractal Analysis
- IEEE Transactions on Neural Networks
, 1931
"... We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002; Tino, Cernansky & Benuskova, 2002; Tino, Cernansky & Benuskova, 2002a). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this paper we further extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram -- a scenario that has been frequently considered in the past e.g. when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box-counting and Hausdor# dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.
Context-Free and Context-Sensitive Dynamics in Recurrent Neural Networks
, 2000
"... Continuous-valued recurrent neural networks can learn mechanisms for processing context-free languages. The dynamics of such networks is usually based on damped oscillation around fixed points in state space and requires that the dynamical components are arranged in certain ways. It is shown tha ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
Continuous-valued recurrent neural networks can learn mechanisms for processing context-free languages. The dynamics of such networks is usually based on damped oscillation around fixed points in state space and requires that the dynamical components are arranged in certain ways. It is shown that qualitatively similar dynamics with similar constraints hold for a n b n c n , a context-sensitive language. The additional difficulty with a n b n c n , compared with the context-free language a n b n , consists of "counting up" and "counting down" letters simultaneously. The network solution is to oscillate in two principal dimensions, one for counting up and one for counting down. This study focuses on the dynamics employed by the Sequential Cascaded Network, in contrast with the Simple Recurrent Network, and the use of Backpropagation Through Time. Found solutions generalize well beyond training data, however, learning is not reliable. The contribution of this ...
Inductive Bias in Context-Free Language Learning
- In Proceedings of the Ninth Australian Conference on Neural Networks
, 1998
"... Recurrent neural networks are capable of learning context-free tasks, however learning performance is unsatisfactory.Weinvestigate the e#ect of biasing learning towards #nding a solution to a context-free prediction task. The #rst series of simulations #xes various sets of weights of the network ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
Recurrent neural networks are capable of learning context-free tasks, however learning performance is unsatisfactory.Weinvestigate the e#ect of biasing learning towards #nding a solution to a context-free prediction task. The #rst series of simulations #xes various sets of weights of the network to values found in a successful network, limiting the search space of the backpropagation through time learning algorithm. We #nd that #xing similar sets of weights can havevery di#erent e#ects on learning performance. The second series of simulations employs an evolutionary hill-climbing algorithm with an error measure that more closely resembles the performance measure. We #nd that under these conditions, the network #nds di#erent solutions to those found by backpropagation, and is even biased towards #nding these solutions. An unexpected result is that the hill-climbing algorithm is capable of generalisation. The two simulations serve to highlight that seemingly similar biases can...
Rule Extraction from Recurrent Neural Networks: a Taxonomy and Review
- Neural Computation
, 2005
"... this paper, the progress of this development is reviewed and analysed in detail. In order to structure the survey and to evaluate the techniques, a taxonomy, specifically designed for this purpose, has been developed. Moreover, important open research issues are identified, that, if addressed pr ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
this paper, the progress of this development is reviewed and analysed in detail. In order to structure the survey and to evaluate the techniques, a taxonomy, specifically designed for this purpose, has been developed. Moreover, important open research issues are identified, that, if addressed properly, possibly can give the field a significant push forward
Stable Encoding of Finite-State Machines in Discrete-Time Recurrent Neural Nets with Sigmoid Units
, 1998
"... In recent years, there has been a lot of interest in the use of discrete-time recurrent neural nets (DTRNN) to learn finite-state tasks, with interesting results regarding the induction of simple finite-state machines from input-output strings. Parallel work has studied the computational power of DT ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
In recent years, there has been a lot of interest in the use of discrete-time recurrent neural nets (DTRNN) to learn finite-state tasks, with interesting results regarding the induction of simple finite-state machines from input-output strings. Parallel work has studied the computational power of DTRNN in connection with finite-state computation. This paper describes a simple strategy to devise stable encodings of finite-state machines in computationally capable discrete-time recurrent neural architectures with sigmoid units, and gives a detailed presentation on how this strategy may be applied to encode a general class of finite-state machines in a variety of commonly-used first- and second-order recurrent neural networks. Unlike previous work that either imposed some restrictions to state values, or used a detailed analysis based on fixed-point attractors, the present approach applies to any positive, bounded, strictly growing, continuous activation function, and uses simple bounding criteri...
Dynamical Automata
, 1998
"... The recent work on automata whose variables and parameters are real numbers (e.g., Blum, Shub, and Smale, 1989; Koiran, 1993; Bournez and Cosnard, 1996; Siegelmann, 1996; Moore, 1996) has focused largely on questions about computational complexity and tractability. It is also revealing to examine th ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
The recent work on automata whose variables and parameters are real numbers (e.g., Blum, Shub, and Smale, 1989; Koiran, 1993; Bournez and Cosnard, 1996; Siegelmann, 1996; Moore, 1996) has focused largely on questions about computational complexity and tractability. It is also revealing to examine the metric relations that such systems induce on automata via the natural metrics on their parameter spaces. This brings the theory of computational classification closer to theories of learning and statistical modeling which depend on measuring distances between models. With this in mind, I develop a generalized method of identifying pushdown automata in one class of real-valued automata. I show how the real-valued automata can be implemented in neural networks. I then explore the metric organization of these automata in a basic example, showing how it fleshes out the skeletal structure of the Chomsky Hierarchy and indicates new approaches to problems in language learning and language typolog...
Joint attention and dynamics repertoire in coupled dynamical recognizers
- In AISB 03: the Second International Symposium on Imitation in Animals and Artifacts
, 2003
"... A coupled dynamical recognizer is proposed as a model for simulating turn-taking behavior. An agent is modeled as a mobile robot with two wheels. A recurrent neural network is used to produce the motor outputs. By controlling this, agents compete to take turns on a two dimensional arena. By using th ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
A coupled dynamical recognizer is proposed as a model for simulating turn-taking behavior. An agent is modeled as a mobile robot with two wheels. A recurrent neural network is used to produce the motor outputs. By controlling this, agents compete to take turns on a two dimensional arena. By using the genetic algorithm technique, we show that turn-taking behavior is developed between two agents. It is worth noting that turntaking is established with a variety of dynamics. A coupling between agents is sensitive to the dynamics and we discuss the sensitivity by referring to Trevarthen’s double monitor experiments. 1 Intersubjectivity and Joint Attention Here in this paper, we propose a simulation study of joint attention via coupled dynamical recognizers. There are many ways to understand psychological phenomena not directly by studying human behavior but by computer simulations and robot experiments (e.g. B.Scassellati (1999), K. Dautenhahn (1999)). To bridge between simulation studies and psychology, we think it worth discussing
Representation Beyond Finite States: Alternatives to Push-Down Automata
- IN: KOLEN AND KREMER
, 2001
"... It has been well established that Dynamical Recurrent Networks (DRNs) can act as deterministic finite-state automata (DFAs --- see Chapters 6 and 7). A DRN can reliably represent the states of a DFA as regions in its state space, and the DFA transitions as transitions between these regions. Howeve ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
It has been well established that Dynamical Recurrent Networks (DRNs) can act as deterministic finite-state automata (DFAs --- see Chapters 6 and 7). A DRN can reliably represent the states of a DFA as regions in its state space, and the DFA transitions as transitions between these regions. However, as we shall see in this chapter, DRNs can learn to process languages which are non-regular (and therefore cannot be processed by any DFA). Moreover, DRNs are capable of generalizing in ways which go beyond the DFA framework. We will show how DRNs can learn to predict context-free and context-sensitive languages, making use of the transient dynamics as the network activations move towards an attractor or away from a repeller. The resulting trajectory can be thought of as analogous to winding up a spring in one dimension and unwinding it in another. In contrast to push-down automata, which rely on unbounded external memory, DRNs must instead rely on arbi

