Results 1 - 10
of
192
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
, 1989
"... The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks. These algorithms have: (1) the advantage that they do not require a precis ..."
Abstract
-
Cited by 368 (4 self)
- Add to MetaCart
The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks. These algorithms have: (1) the advantage that they do not require a precisely defined training interval, operating while the network runs; and (2) the disadvantage that they require nonlocal communication in the network being trained and are computationally expensive. These algorithms are shown to allow networks having recurrent connections to learn complex tasks requiring the retention of information over time periods having either fixed or indefinite length. 1 Introduction A major problem in connectionist theory is to develop learning algorithms that can tap the full computational power of neural networks. Much progress has been made with feedforward networks, and attention has recently turned to developing algorithms for networks with recurrent connections, wh...
A distributed, developmental model of word recognition and naming
- Psychological Review
, 1989
"... A parallel distributed processing model of visual word recognition and pronunciation is described. The model consists of sets of orthographic and phonologlc ~ units and an interlevel of hidden units. Weights on connections between units were modified during a training phase using the back-propa-gati ..."
Abstract
-
Cited by 303 (35 self)
- Add to MetaCart
A parallel distributed processing model of visual word recognition and pronunciation is described. The model consists of sets of orthographic and phonologlc ~ units and an interlevel of hidden units. Weights on connections between units were modified during a training phase using the back-propa-gation learning algorithm. The model simulates many aspects of human performance, including (a) differences bet~n~.'n words in terms of processing difficulty, (b) pronunciation of novel items, (c) differences between readers in terms of word recognition skill, (d) transitions from beginning to skilled reading, and (e) differences in performance on lexieal decision and naming tasks. The model's behavior early in the learning phase corresponds to that of children acquiring word recognition skills. Training with a smaller number of hidden units produces output characteristic of many dys-lexic readers. Naming is simulated without pronunciation rules, and lexical decisions are simulated without accessing word-level representations. The performance of the model is largely determined by three factors: the nature of the input, a significant fragment of written English; the learning rule, which encodes the implicit structure of the orthography in the weights on connections; and the architecture of the system, which influences the scope of what can be learned. The recognition and pronunciation of words is one of the cen-
Task Decomposition Through Competition in a Modular Connectionist Architecture
- COGNITIVE SCIENCE
, 1990
"... A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to compute different functions. The architecture pe ..."
Abstract
-
Cited by 167 (4 self)
- Add to MetaCart
A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to compute different functions. The architecture performs task decomposition in the sense that it learns to partition a task into two or more functionally independent vii tasks and allocates distinct networks to learn each task. In addition, the architecture tends to allocate to each task the network whose topology is most appropriate to that task, and tends to allocate the same network to similar tasks and distinct networks to dissimilar tasks. Furthermore, it can be easily modified so as to...
Gradient calculation for dynamic recurrent neural networks: a survey
- IEEE Transactions on Neural Networks
, 1995
"... Abstract | We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non- xedpoint algorithms, namely backp ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
Abstract | We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non- xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
Deep Dyslexia: A Case Study of Connectionist Neuropsychology
, 1993
"... Deep dyslexia is an acquired reading disorder marked by the occurrence of semantic errors (e.g., reading RIVER as "ocean"). In addition, patients exhibit a number of other symptoms, including visual and morphological effects in their errors, a part-of-speech effect, and an advantage for concrete ove ..."
Abstract
-
Cited by 110 (25 self)
- Add to MetaCart
Deep dyslexia is an acquired reading disorder marked by the occurrence of semantic errors (e.g., reading RIVER as "ocean"). In addition, patients exhibit a number of other symptoms, including visual and morphological effects in their errors, a part-of-speech effect, and an advantage for concrete over abstract words. Deep dyslexia poses a distinct challenge for cognitive neuropsychology because there is little understanding of why such a variety of symptoms should co-occur in virtually all known patients. Hinton and Shallice (1991) replicated the co-occurrence of visual and semantic errors by lesioning a recurrent connectionist network trained to map from orthography to semantics. While the success of their simulations is encouraging, there is little understanding of what underlying principles are responsible for them. In this paper we evaluate and, where possible, improve on the most important design decisions made by Hinton and Shallice, relating to the task, the network architecture, the training procedure, and the testing procedure. We identify four properties of networks that underly their ability to reproduce the deep dyslexic symptom-complex: distributed orthographic and semantic representations, gradient descent learning, attractors for word meanings, and greater richness of concrete vs. abstract semantics. The first three of these are general connectionist principles and the last is based on earlier theorizing. Taken together, the results demonstrate the usefulness of a connectionist approach to understanding deep dyslexia in particular, and the viability of connectionist neuropsychology in general.
Neural Net Architectures for Temporal Sequence Processing
, 1994
"... I present a general taxonomy of neural net architectures for processing time-varying patterns. This taxonomy subsumes many existing architectures in the literature, and points to several promising architectures that have yet to be examined. Any architecture that processes timevarying patterns requir ..."
Abstract
-
Cited by 103 (0 self)
- Add to MetaCart
I present a general taxonomy of neural net architectures for processing time-varying patterns. This taxonomy subsumes many existing architectures in the literature, and points to several promising architectures that have yet to be examined. Any architecture that processes timevarying patterns requires two conceptually distinct components: a short-term memory that holds on to relevant past events and an associator that uses the short-term memory to classify or predict. My taxonomy is based on a characterization of short-term memory models along the dimensions of form, content, and adaptability. Experiments on predicting future values of a financial time series (US dollar--Swiss franc exchange rates) are presented using several alternative memory models. The results of these experiments serve as a baseline against which more sophisticated architectures can be compared. Neural networks have proven to be a promising alternative to traditional techniques for nonlinear temporal prediction t...
Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity
, 1995
"... Introduction 1.1 Learning in Recurrent Networks Connectionist networks having feedback connections are interesting for a number of reasons. Biological neural networks are highly recurrently connected, and many authors have studied recurrent network models of various types of perceptual and memory pr ..."
Abstract
-
Cited by 100 (4 self)
- Add to MetaCart
Introduction 1.1 Learning in Recurrent Networks Connectionist networks having feedback connections are interesting for a number of reasons. Biological neural networks are highly recurrently connected, and many authors have studied recurrent network models of various types of perceptual and memory processes. The general property making such networks interesting and potentially useful is that they manifest highly nonlinear dynamical behavior. One such type of dynamical behavior that has received much attention is that of settling to a fixed stable state, but probably of greater importance both biologically and from an engineering viewpoint are time-varying behaviors. Here we consider algorithms for training recurrent networks to perform temporal supervised learning tasks, in which the specification of desired behavior is in the form of specific examples of input and desired output trajectories. One example of such a task is sequence classification, where
Learning the structure of event sequences
- JOURNAL OF EXPERIMENTAL PSYCHOLOGY: GENERAL
, 1991
"... How is complex sequential material acquired, processed, and represented when there is no intention to learn? Two experiments exploring a choice reaction time task are reported. Unknown to Ss, successive stimuli followed a sequence derived from a "noisy " finite-state grammar. After conside ..."
Abstract
-
Cited by 96 (23 self)
- Add to MetaCart
How is complex sequential material acquired, processed, and represented when there is no intention to learn? Two experiments exploring a choice reaction time task are reported. Unknown to Ss, successive stimuli followed a sequence derived from a "noisy " finite-state grammar. After considerable practice (60,000 exposures) with Experiment 1, Ss acquired a complex body of procedural knowledge about the sequential structure of the material. Experiment 2 was an attempt to identify limits on Ss ability to encode the temporal context by using more distant contingencies that spanned irrelevant material. Taken together, the results indicate that Ss become increasingly sensitive to the temporal context set by previous elements of the sequence, up to 3 elements. Responses are also affected by priming effects from recent trials. A connectionist model that incorporates sensitivity to the sequential structure and to priming effects is shown to capture key aspects of both acquisition and processing and to account for the interaction between attention and sequence structure reported by Cohen, Ivry, and Keele (1990).
Graded state machines: The representation of temporal contingencies in simple recurrent networks
- Machine Learning
, 1991
"... Abstract. We explore a network architecture introduced by Elman (1990) for predicting successive elements of a sequence. The network uses the pattern of activation over a set of hidden units from time-step t-1, together with element t, to predict element t + 1. When the network is trained with strin ..."
Abstract
-
Cited by 78 (10 self)
- Add to MetaCart
Abstract. We explore a network architecture introduced by Elman (1990) for predicting successive elements of a sequence. The network uses the pattern of activation over a set of hidden units from time-step t-1, together with element t, to predict element t + 1. When the network is trained with strings from a particular finite-state grammar, it can learn to be a perfect finite-state recognizer for the grammar. When the net has a minimal number of hidden units, patterns on the hidden units come to correspond to the nodes of the grammar; however, this correspondence is not necessary for the network to act as a perfect finite-state recognizer. Next, we provide a detailed analysis of how the network acquires its internal representations. We show that the network progressively encodes more and more temporal context by means of a probability analysis. Finally, we explore the conditions under which the network can carry information about distant sequential contingencies across intervening elements to distant elements. Such information is maintained with relative ease if it is relevant at each intermediate step; it tends to be lost when intervening elements do not depend on it. At first glance this may suggest that such networks are not relevant to natural language, in which dependencies may span indefinite distances. However, embeddings in natural language are not completely independent of earlier information. The final simulation shows that long distance sequential contingencies can be encoded by the network even if only subtle statistical properties of embedded strings depend on the early information. The network encodes long-distance dependencies by shading
Natural Language Processing with Modular PDP Networks and Distributed Lexicon
- Cognitive Science
, 1991
"... An approach to connectionist natural language processing is proposed, which is based on hierarchically organized modular Parallel Distributed Processing (PDP) networks and a central lexicon of distributed input/output representations. The modules communicate using these representations, which are gl ..."
Abstract
-
Cited by 77 (13 self)
- Add to MetaCart
An approach to connectionist natural language processing is proposed, which is based on hierarchically organized modular Parallel Distributed Processing (PDP) networks and a central lexicon of distributed input/output representations. The modules communicate using these representations, which are global and publicly available in the system. The representations are developed automatically by all networks while they are learning their processing tasks. The resulting representations reflect the regularities in the subtasks, which facilitates robust processing in the face of noise and damage, supports improved generalization, and provides expectations about possible contexts. The lexicon can be extended by cloning new instances of the items, that is, by generating a number of items with known processing properties and distinct identities. This technique combinatorially increases the processing power of the system. The recurrent FGREP module, together with a central lexicon, is used as a ba...

