Results 1 - 10
of
55
Gradient calculation for dynamic recurrent neural networks: a survey
- IEEE Transactions on Neural Networks
, 1995
"... Abstract | We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non- xedpoint algorithms, namely backp ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
Abstract | We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non- xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
Noisy Time Series Prediction using a Recurrent Neural Network and Grammatical Inference
- Machine Learning
, 2001
"... Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method proposed uses conversion into a symbolic representation with a selforganizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for th...
Learning long-term dependencies in NARX recurrent neural networks
, 1996
"... It has recently been shown that gradient-descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long--term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show tht the long--term de ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
It has recently been shown that gradient-descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long--term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show tht the long--term dependencies problem is lessened for a class of architectures called NARX recurrent neural networks, which have powerful representational capabilities. We have previously reported that gradient descent learning can be more effective in NARX networks than in recurrent neural network architectures that have "hidden states" on problems including grammatical inference and nonlinear system identification. Typically, the network converges much faster and generalizes better than other networks. The results in this paper are consistent with this phenomenon. We present some experimental results which show that NARX networks can often retain information for two to three times as long as conventi...
An Experimental Comparison of Recurrent Neural Networks
- In Advances in Neural Information Processing Systems 7
, 1995
"... Many different discrete--time recurrent neural network architectures have been proposed. However, there has been virtually no effort to compare these architectures experimentally. In this paper we review and categorize many of these architectures and compare how they perform on various classes of si ..."
Abstract
-
Cited by 34 (12 self)
- Add to MetaCart
Many different discrete--time recurrent neural network architectures have been proposed. However, there has been virtually no effort to compare these architectures experimentally. In this paper we review and categorize many of these architectures and compare how they perform on various classes of simple problems including grammatical inference and nonlinear system identification. 1 Introduction In the past few years several recurrent neural network architectures have emerged. In this paper we categorize various discrete--time recurrent neural network architectures, and perform a quantitative comparison of these architectures on two problems: grammatical inference and nonlinear system identification. 2 RNN Architectures We broadly divide these networks into two groups depending on whether or not the states of the network are guaranteed to be observable. A network with observable Also with UMIACS, University of Maryland, College Park, MD 20742 y Published in Neural Information Pr...
Computational capabilities of recurrent NARX neural networks
, 1997
"... Recently, fully connected recurrent neural networks have been proven to be computationally rich --- at least as powerful as Turing machines. This work focuses on another network which is popular in control applications and has been found to be very effective at learning a variety of problems. These ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
Recently, fully connected recurrent neural networks have been proven to be computationally rich --- at least as powerful as Turing machines. This work focuses on another network which is popular in control applications and has been found to be very effective at learning a variety of problems. These networks are based upon Nonlinear AutoRegressive models with eXogenous Inputs (NARX models), and are therefore called NARX networks. As opposed to other recurrent networks, NARX networks have a limited feedback which comes only from the output neuron rather than from hidden states. They are formalized by y(t) = \Psi i u(t \Gamma nu ); : : : ; u(t \Gamma 1); u(t); y(t \Gamma ny ); : : : ; y(t \Gamma 1) j ; where u(t) and y(t) represent input and output of the network at time t, nu and ny are the input and output order, and the function \Psi is the mapping performed by a Multilayer Perceptron. We constructively prove that the NARX networks with a finite number of parameters are computation...
Sample Complexity for Learning Recurrent Perceptron Mappings
- IEEE Trans. Inform. Theory
, 1996
"... Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to e ..."
Abstract
-
Cited by 22 (10 self)
- Add to MetaCart
Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. Keywords: perceptrons, recurrent models, neural networks, learning, Vapnik-Chervonenkis dimension 1 Introduction One of the most popular approaches to binary pattern classification, underlying many statistical techniques, is based on perceptrons or linear discriminants ; see for instance the classical reference [9]. In this context, one is interested in classifying k-dimensional input patterns v = (v 1 ; : : : ; v k ) into two disjoint classes A + and A \Gamma . A perceptron P which classifies vectors into A + and A \Gamma is characterized by a vector (of "weights") ~c 2 R k , and operates as follows. One forms the inner product ~c:v = c 1 v 1 + : : : c k v k . I...
Memory-Universal Prediction of Stationary Random Processes
- IEEE Trans. Inform. Theory
, 1998
"... We consider the problem of one-step-ahead prediction of a real-valued, stationary, strongly mixing random process fX i g i=01 . The best mean-square predictor of X0 is its conditional mean given the entire infinite past fX i g i=01 . Given a sequence of observations X1 X2 111 XN, we propose estimato ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
We consider the problem of one-step-ahead prediction of a real-valued, stationary, strongly mixing random process fX i g i=01 . The best mean-square predictor of X0 is its conditional mean given the entire infinite past fX i g i=01 . Given a sequence of observations X1 X2 111 XN, we propose estimators for the conditional mean based on sequences of parametric models of increasing memory and of increasing dimension, for example, neural networks and Legendre polynomials. The proposed estimators select both the model memory and the model dimension, in a data-driven fashion, by minimizing certain complexity regularized least squares criteria. When the underlying predictor function has a finite memory, we establish that the proposed estimators are memory-universal: the proposed estimators, which do not know the true memory, deliver the same statistical performance (rates of integrated mean-squared error) as that delivered by estimators that know the true memory. Furthermore, when the underlying predictor function does not have a finite memory, we establish that the estimator based on Legendre polynomials is consistent.
Learning a Class of Large Finite State Machines with a Recurrent Neural Network
, 1995
"... One of the issues in any learning model is how it scales with problem size. The problem of learning finite state machine (FSMs) from examples with recurrent neural networks has been extensively explored. However, these results are somewhat disappointing in the sense that the machines that can be le ..."
Abstract
-
Cited by 20 (11 self)
- Add to MetaCart
One of the issues in any learning model is how it scales with problem size. The problem of learning finite state machine (FSMs) from examples with recurrent neural networks has been extensively explored. However, these results are somewhat disappointing in the sense that the machines that can be learned are too small to be competitive with existing grammatical inference algorithms. We show that a type of recurrent neural network (Narendra & Parthasarathy, 1990, IEEE Trans. Neural Networks, 1, 4-27) which has feedback but no hidden state neurons can learn a special type of FSM called a finite memory machine (FMM) under certain constraints. These machines have a large number of states (simulations are for 256 and 512 state FMMs) but have minimal order, relatively small depth and little logic when the FMM is implemented as a sequential machine,
Design of Neural Network Filters
- Electronics Institute, Technical University of Denmark
, 1993
"... Emnet for n rv rende licentiatafhandling er design af neurale netv rks ltre. Filtre baseret pa neurale netv rk kan ses som udvidelser af det klassiske line re adaptive l-ter rettet mod modellering af uline re sammenh nge. Hovedv gten l gges pa en neural netv rks implementering af den ikke-rekursive, ..."
Abstract
-
Cited by 19 (12 self)
- Add to MetaCart
Emnet for n rv rende licentiatafhandling er design af neurale netv rks ltre. Filtre baseret pa neurale netv rk kan ses som udvidelser af det klassiske line re adaptive l-ter rettet mod modellering af uline re sammenh nge. Hovedv gten l gges pa en neural netv rks implementering af den ikke-rekursive, uline re adaptive model med additiv st j. Formalet er at klarl gge en r kke faser forbundet med design af neural netv rks arkitekturer med henblik pa at udf re forskellige \black-box " modellerings opgaver sa som: System identi kation, invers modellering og pr diktion af tidsserier. De v senligste bidrag omfatter: Formulering af en neural netv rks baseret kanonisk lter repr sentation, der danner baggrund for udvikling af et arkitektur klassi kationssystem. I hovedsagen drejer det sig om en skelnen mellem globale og lokale modeller. Dette leder til at en r kke kendte neurale netv rks arkitekturer kan klassi ceres, og yderligere abnes der mulighed for udvikling af helt nye strukturer. I denne sammenh ng ndes en gennemgang af en r kke velkendte arkitekturer. I s rdeleshed l gges der v gt pa behandlingen af multi-lags perceptron neural netv rket.
Dynamic Stochastic Synapses as Computational Units
, 1999
"... In most neural network models, synapses are treated as static weights that change only on the slow time scales of learning. It is well known, however, that synapses are highly dynamic, and show use-dependent plasticity over a wide range of time scales. Moreover, synaptic transmission is an inhere ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
In most neural network models, synapses are treated as static weights that change only on the slow time scales of learning. It is well known, however, that synapses are highly dynamic, and show use-dependent plasticity over a wide range of time scales. Moreover, synaptic transmission is an inherently stochastic process: a spike arriving at a presynaptic terminal triggers release of a vesicle of neurotransmitter from a release site with a probability that can be much less than one. We consider

