Results 1 - 10
of
77
Constructing Deterministic Finite-State Automata in Recurrent Neural Networks
- Journal of the ACM
, 1996
"... Recurrent neural networks that are trained to behave like deterministic finite-state automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use o ..."
Abstract
-
Cited by 66 (15 self)
- Add to MetaCart
Recurrent neural networks that are trained to behave like deterministic finite-state automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use of a sigmoidal discriminant function together with the recurrent structure contribute to this instability. We prove that a simple algorithm can construct second-order recurrent neural networks with a sparse interconnection topology and sigmoidal discriminant function such that the internal DFA state representations are stable, i.e. the constructed network correctly classifies strings of arbitrary length. The algorithm is based on encoding strengths of weights directly into the neural network. We derive a relationship between the weight strength and the number of DFA states for robust string classification. For a DFA with n states and m input alphabet symbols, the constructive algorithm genera...
LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages
- IEEE Transactions on Neural Networks
, 2001
"... Previous work on learning regular languages from exemplary training sequences showed that Long Short- Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks ..."
Abstract
-
Cited by 54 (20 self)
- Add to MetaCart
Previous work on learning regular languages from exemplary training sequences showed that Long Short- Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks (RNNs), and show that it works even better than previous hardwired or highly specialized architectures.
Dynamical Recognizers: Real-time Language Recognition by Analog Computers
- Theoretical Computer Science
, 1996
"... We consider a model of analog computation which can recognize various languages in real time. We encode an input word as a point in R d by composing iterated maps, and then apply inequalities to the resulting point to test for membership in the language. Each class of maps and inequalities, suc ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
We consider a model of analog computation which can recognize various languages in real time. We encode an input word as a point in R d by composing iterated maps, and then apply inequalities to the resulting point to test for membership in the language. Each class of maps and inequalities, such as quadratic functions with rational coefficients, is capable of recognizing a particular class of languages; for instance, linear and quadratic maps can have both stack-like and queue-like memories. We use methods equivalent to the VapnikChervonenkis dimension to separate some of our classes from each other, e.g. linear maps are less powerful than quadratic or piecewise-linear ones, polynomials are less powerful than elementary (trigonometric and exponential) maps, and deterministic polynomials of each degree are less powerful than their non-deterministic counterparts. Comparing these dynamical classes with various discrete language classes helps illuminate how iterated maps can...
Approximating the Semantics of Logic Programs by Recurrent Neural Networks
"... In [18] we have shown how to construct a 3-layered recurrent neural network that computes the fixed point of the meaning function TP of a given propositional logic program P, which corresponds to the computation of the semantics of P. In this article we consider the first order case. We define a no ..."
Abstract
-
Cited by 48 (8 self)
- Add to MetaCart
In [18] we have shown how to construct a 3-layered recurrent neural network that computes the fixed point of the meaning function TP of a given propositional logic program P, which corresponds to the computation of the semantics of P. In this article we consider the first order case. We define a notion of approximation for interpretations and prove that there exists a 3-layered feed forward neural network that approximates the calculation of TP for a given first order acyclic logic program P with an injective level mapping arbitrarily well. Extending the feed forward network by recurrent connections we obtain a recurrent neural network whose iteration approximates the fixed point of TP. This result is proven by taking advantage of the fact that for acyclic logic programs the function TP is a contraction mapping on a complete metric space defined by the interpretations of the program. Mapping this space to the metric space IR with Euclidean distance, a real valued function fP can be defined which corresponds to TP and is continuous as well as a contraction. Consequently it can be approximated by an appropriately chosen class of feed forward neural networks.
Natural language grammatical inference with recurrent neural networks
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1998
"... This paper examines the inductive inference of a complex grammar with neural networks -- specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the P ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
This paper examines the inductive inference of a complex grammar with neural networks -- specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government-and-Binding theory. Neural networks are trained, without the division into learned vs. innate components assumed by Chomsky, in an attempt to produce the same judgments as native speakers on sharply grammatical/ungrammatical data. How a recurrent neural network could possess linguistic capability and the properties of various common recurrent neural network architectures are discussed. The problem exhibits training behavior which is often not present with smaller grammars and training was initially difficult. However, after implementing several techniques aimed at improving the convergence of the gradient descent backpropagation-through-time training algorithm, significant learning was possible. It was found that certain architectures are better able to learn an appropriate grammar. The operation of the networks and their training is analyzed. Finally, the extraction of rules in the form of deterministic finite state automata is investigated.
Analysis of Dynamical Recognizers
- Neural Computation
, 1996
"... Pollack #1991# demonstrated that second-order recurrent neural networks can act as dynamical recognizers for formal languages when trained on positive and negative examples, and observed both phase transitions in learning and IFS-like fractal state sets. Follow-on work focused mainly on the extra ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
Pollack #1991# demonstrated that second-order recurrent neural networks can act as dynamical recognizers for formal languages when trained on positive and negative examples, and observed both phase transitions in learning and IFS-like fractal state sets. Follow-on work focused mainly on the extraction and minimization of a #nite state automaton #FSA# from the trained network. However, such networks are capable of inducing languages which are not regular, and therefore not equivalenttoany FSA. Indeed, it may be simpler for a small network to #t its training data by inducing such a non-regular language. But when is the network's language not regular? In this paper, using a low dimensional network capable of learning all the Tomita data sets, we present an empirical method for testing whether the language induced by the network is regular or not. We also provide a detailed "-machine analysis of trained networks for both regular and non-regular languages. 1 1 Introduction Po...
A Survey of Continuous-Time Computation Theory
- Advances in Algorithms, Languages, and Complexity
, 1997
"... Motivated partly by the resurgence of neural computation research, and partly by advances in device technology, there has been a recent increase of interest in analog, continuous-time computation. However, while special-case algorithms and devices are being developed, relatively little work exists o ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
Motivated partly by the resurgence of neural computation research, and partly by advances in device technology, there has been a recent increase of interest in analog, continuous-time computation. However, while special-case algorithms and devices are being developed, relatively little work exists on the general theory of continuous-time models of computation. In this paper, we survey the existing models and results in this area, and point to some of the open research questions. 1 Introduction After a long period of oblivion, interest in analog computation is again on the rise. The immediate cause for this new wave of activity is surely the success of the neural networks "revolution", which has provided hardware designers with several new numerically based, computationally interesting models that are structurally sufficiently simple to be implemented directly in silicon. (For designs and actual implementations of neural models in VLSI, see e.g. [30, 45]). However, the more fundamental...
Evolution and Analysis of Model CPGs for Walking I. Dynamical Modules
"... Can one develop an abstract description of the dynamics of pattern generators that provides quantitative insight into their operation? We explored this question by examining the dynamics of a model central pattern generator that was created using an evolutionary algorithm. We propose an abstract des ..."
Abstract
-
Cited by 24 (12 self)
- Add to MetaCart
Can one develop an abstract description of the dynamics of pattern generators that provides quantitative insight into their operation? We explored this question by examining the dynamics of a model central pattern generator that was created using an evolutionary algorithm. We propose an abstract description based on the concept of a dynamical module, a set of neurons that simultaneously make their transitions from one quasistable state to another while the synaptic inputs that they receive remain essentially constant, thus temporarily reducing the dimensionality of the circuit dynamics. Using the mathematical tools of dynamical systems theory, we describe a method for identifying dynamical modules, and demonstrate that this concept can be used to quantitatively characterize constraints on neural architecture, account for phase durations, and predict the effects of parameter changes. Moreover, this abstract description reveals coordinated parameter changes that leave the overall circuit...
Architectural Bias in Recurrent Neural Networks - Fractal Analysis
- IEEE Transactions on Neural Networks
, 1931
"... We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002; Tino, Cernansky & Benuskova, 2002; Tino, Cernansky & Benuskova, 2002a). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this paper we further extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram -- a scenario that has been frequently considered in the past e.g. when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box-counting and Hausdor# dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.

