Results 1 - 10
of
78
An Application of Recurrent Nets to Phone Probability Estimation
- IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract
-
Cited by 165 (8 self)
- Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Gradient calculation for dynamic recurrent neural networks: a survey
- IEEE Transactions on Neural Networks
, 1995
"... Abstract | We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non- xedpoint algorithms, namely backp ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
Abstract | We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non- xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories
- Neural Computation
, 1990
"... A novel variant of a familiar recurrent network learning algorithm is described. This algorithm is capable of shaping the behavior of an arbitrary recurrent network as it runs, and it is specifically designed to execute efficiently on serial machines. 1 Introduction Artificial neural networks having ..."
Abstract
-
Cited by 105 (3 self)
- Add to MetaCart
A novel variant of a familiar recurrent network learning algorithm is described. This algorithm is capable of shaping the behavior of an arbitrary recurrent network as it runs, and it is specifically designed to execute efficiently on serial machines. 1 Introduction Artificial neural networks having feedback connections can implement a wide variety of dynamical systems. The problem of training such a network is the problem of finding a particular dynamical system from among a parameterized family of such systems which best fits the desired specification. This paper proposes a specific learning algorithm for temporal supervised learning tasks, in which the specification of desired behavior is in the form of specific examples of input and desired output trajectories. One example of such a task is sequence classification, where the input is the sequence to be classified and the desired output is the correct classification, which is to be produced at the end of the sequence. Another examp...
Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia
, 2005
"... The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mec ..."
Abstract
-
Cited by 63 (4 self)
- Add to MetaCart
The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mechanistic basis of executive function remains elusive, often amounting to a homunculus. This article presents an attempt to deconstruct this homunculus through powerful learning mechanisms that allow a computational model of the prefrontal cortex to control both itself and other brain areas in a strategic, task-appropriate manner. These learning mechanisms are based on subcortical structures in the midbrain, basal ganglia, and amygdala, which together form an actor-critic architecture. The critic system learns which prefrontal representations are task relevant and trains the actor, which in turn provides a dynamic gating mechanism for controlling working memory updating. Computationally, the learning mechanism is designed to simultaneously solve the temporal and structural credit assignment problems. The model’s performance compares favorably with standard backpropagation-based temporal learning mechanisms on the challenging 1-2-AX working memory task and other benchmark working memory tasks.
LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages
- IEEE Transactions on Neural Networks
, 2001
"... Previous work on learning regular languages from exemplary training sequences showed that Long Short- Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks ..."
Abstract
-
Cited by 54 (20 self)
- Add to MetaCart
Previous work on learning regular languages from exemplary training sequences showed that Long Short- Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks (RNNs), and show that it works even better than previous hardwired or highly specialized architectures.
Learning long-term dependencies in NARX recurrent neural networks
, 1996
"... It has recently been shown that gradient-descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long--term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show tht the long--term de ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
It has recently been shown that gradient-descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long--term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show tht the long--term dependencies problem is lessened for a class of architectures called NARX recurrent neural networks, which have powerful representational capabilities. We have previously reported that gradient descent learning can be more effective in NARX networks than in recurrent neural network architectures that have "hidden states" on problems including grammatical inference and nonlinear system identification. Typically, the network converges much faster and generalizes better than other networks. The results in this paper are consistent with this phenomenon. We present some experimental results which show that NARX networks can often retain information for two to three times as long as conventi...
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies
, 2001
"... Recurrent networks (crossreference Chapter 12) can, in principle, use their feedback connections to store representations of recent input events in the form of activations. The most widely used algorithms for learning what to put in short-term memory, however, take too much time to be feasible or d ..."
Abstract
-
Cited by 33 (20 self)
- Add to MetaCart
Recurrent networks (crossreference Chapter 12) can, in principle, use their feedback connections to store representations of recent input events in the form of activations. The most widely used algorithms for learning what to put in short-term memory, however, take too much time to be feasible or do not work well at all, especially when minimal time lags between inputs and corresponding teacher signals are long. Although theoretically fascinating, they do not provide clear practical advantages over, say, backprop in feedforward networks with limited time windows (see crossreference Chapters 11 and 12). With conventional "algorithms based on the computation of the complete gradient", such as "Back-Propagation Through Time" (BPTT, e.g., [22, 27, 26]) or "Real-Time Recurrent Learning" (RTRL, e.g., [21]) error signals "flowing backwards in time" tend to either (1) blow up or (2) vanish: the temporal evolution of the backpropagated error ex
Doing without schema hierarchies: A recurrent connectionist approach to normal and impaired routine sequential action
- Psychological Review
, 2004
"... In everyday tasks, selecting actions in the proper sequence requires a continuously updated representation of temporal context. Many existing models address this problem by positing a hierarchy of processing units, mirroring the roughly hierarchical structure of naturalistic tasks themselves. Such a ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
In everyday tasks, selecting actions in the proper sequence requires a continuously updated representation of temporal context. Many existing models address this problem by positing a hierarchy of processing units, mirroring the roughly hierarchical structure of naturalistic tasks themselves. Such an approach has led to a number of difficulties, including a reliance on overly rigid sequencing mechanisms, an inability to account for context sensitivity in behavior, and a failure to address learning. We consider here an alternative framework, according to which the representation of temporal context is facilitated by recurrent connections within a network mapping from environmental inputs to actions. Applying this approach to a specific, and in many ways prototypical, everyday task (coffee-making), we examine its ability to account for several central characteristics of normal and impaired human performance. The model we consider learns to deal flexibly with a complex set of sequencing constraints, encoding contextual information at multiple time-scales within a single, distributed internal representation. Mildly degrading this context representation leads
A Fixed Size Storage O(n³) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks
- NEURAL COMPUTATION
, 1992
"... The RTRL algorithm for fully recurrent continually running networks (Robinson and Fallside, 1987)(Williams and Zipser, 1989) requires O(n^4) computations per time step, where n is the number of non-input units. I describe a method suited for on-line learning which computes exactly the same gradient ..."
Abstract
-
Cited by 31 (12 self)
- Add to MetaCart
The RTRL algorithm for fully recurrent continually running networks (Robinson and Fallside, 1987)(Williams and Zipser, 1989) requires O(n^4) computations per time step, where n is the number of non-input units. I describe a method suited for on-line learning which computes exactly the same gradient and requires fixed-size storage of the same order but has an average time complexity per time step of O(n³).
Probabilistic Logic Learning
- ACM-SIGKDD Explorations: Special issue on Multi-Relational Data Mining
, 2004
"... The past few years have witnessed an significant interest in probabilistic logic learning, i.e. in research lying at the intersection of probabilistic reasoning, logical representations, and machine learning. A rich variety of di#erent formalisms and learning techniques have been developed. This pap ..."
Abstract
-
Cited by 31 (8 self)
- Add to MetaCart
The past few years have witnessed an significant interest in probabilistic logic learning, i.e. in research lying at the intersection of probabilistic reasoning, logical representations, and machine learning. A rich variety of di#erent formalisms and learning techniques have been developed. This paper provides an introductory survey and overview of the stateof -the-art in probabilistic logic learning through the identification of a number of important probabilistic, logical and learning concepts.

