36 citations found. Retrieving documents...
M. C. Mozer. A focused backpropagation algorithm for temporal pattern recognition. Technical Report CRG-TR-88-3, Depts. of Psychology and Computer Science, University of Toronto, Toronto, Jun 1988.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

A Unifying View of Gradient Calculations and Learning.. - Campolucci, Uncini.. (1997)   (1 citation)  (Correct)

....one. BPS basically is the classical backpropagation on the multilayer network with a recursive computation only inside each dynamic neuron. Due to the architectural constraints, this algorithm does not implement BPTT, but is somewhere between BP and RTRL. The same approch was proposed by Mozer in [24] that indipendently derived a quite similar algorithm named Focused Back Propagation for a locally recurrent network analogous to the Auto Regressive MLP (AR MLP) 25] BPS was rediscovered in [19] where was derived for a structure that is a particular case of the output feedback LF MLN and was ....

....convergence. Also the BPS algorithm [13] can be obtained as particular case of CRBP when the architecture restriction that the dynamic units can only be placed in the first layer is given. The CRBP applied to the AR MLP can be also viewed as a generalization of Mozer s and Leighton Conrath s work [24,25]. Moreover if all the synapses contain only the MA part (I nm =0 for each n,m and l) the architecture reduces to FIR MLP and this algorithm reduces to the Temporal Back Propagation as in [7,9,21] Obviously, if all the synaptic filters have no memory (I nm =0 and =1 for each n,m and l) this ....

M.C. Mozer. A Focused Back-propagation Algorithm for Temporal Pattern Recognition. Tech Rep. CRG-TR-88-3, University of Toronto, 1988 and Complex Systems 3: 349-381, 1989.


On Planning And Exploration In Non-Discrete Environments - Thrun, Möller (1991)   (4 citations)  (Correct)

....behave randomly. In this case we will assume that these quantities have been visible some time before, otherwise the mapping of the world is not a function any more. Thus, the world model has to be able to store context information through time. By using a recurrent network [Jor86, Elm88, Ghe89, Moz88, Pin87, Pea88, Pea90, RF87, WZ88, TS90] instead of a feed forward network as a model this problem can be overcome. We will focus on recurrent networks of the Elman type [Elm88] Fig. 2 b) which seem to be best suited for our purpose. If we speak of states in turn, we mean externally observable ....

M. C. Mozer. A focused backpropagation algorithm for temporal pattern recognition. Technical Report CRG-TR-88-3, Depts. of Psychology and Computer Science, University of Toronto, Toronto, Jun 1988.


Genetic Synthesis of Recurrent Neural Networks - Syed   (Correct)

....the output and error. These updates are computed using a sequence of calculations at each iteration. The weights are updated either after each iteration or after the final iteration of the epoch. This algorithm was proposed for discrete time RNNs by a number of different researchers [Kuhn, 1987; Mozer, 1988; Robinson and Fallside, 1987; Williams and Zipser, 1989] The continuous time version of recursive backpropagation was first proposed by Pineda [Pineda, 1988] The major disadvantage of this algorithm is that it requires an extensive amount of computation at each iteration. Another algorithm ....

Mozer, M. (1988). A focused back-propagation algorithm for temporal pattern recognition. Technical Report, Departments of Psychology and Computer Science, University of Toronto, Toronto.


Recurrent Neural Networks for Adaptive Temporal Processing - Bengio, Frasconi, Gori (1993)   (2 citations)  (Correct)

....gradient computation techniques trade space vs. time locality. Interestingly, if the the recurrent connections are restricted to self loops for some neurons, both space and time locality can be achieved. Local gradient computation algorithms for such networks were independently derived by Mozer [38] and Gori, Bengio and De Mori [39] In a local feedback architecture (see Fig. 1, neurons are connected in layers like in feedforward nets, eventually using shortcut links (i.e. skipping one or more layers. Units directly connected to the external inputs may have a self loop link, thus ....

....(17) 10 Figure 1: Example of Local Feedback multi layered network. Many variant of the above equation are possible. For example y i (t Gamma 1) may be replaced by a i (t Gamma 1) obtaining an activation feedback connection that yields a linear dynamic behavior (such approach was pursued in [38]. Another variant is to introduce multiple delays, feeding back the past output [activation] of the unit at time t Gamma 2; t Gamma 3 and so on. This corresponds to a higher than first order nonlinear [linear] autoregressive model for the unit. A more general case is to use an autoregressive ....

M. Mozer, "A focused back-propagation algorithm for temporal pattern recognition," Complex Systems, vol. 3, pp. 349--381, 1989.


The Induction of Dynamical Recognizers - Pollack (1991)   (126 citations)  (Correct)

....that same unit receives feedback from other examples. When the don t cares line up, the weights to those units will never change. One possible fix, so called Back Propagation through time (Rumelhart et al. 1986) involves a complete unrolling of a recurrent loop and has had only modest success (Mozer, 1988), probably because of conflicts arising from equivalence constraints between interdependent layers. My fix involves a single backspace, unrolling the loop only once. For a particular string, this leads to the calculation of only one error term for each weight (and thus no conflict) as follows. ....

Mozer, M. (1988). A focused Back-propagation Algorithm for Temporal Pattern Recognition. CRG-Technical Report-88-3: University of Toronto.


Recognizing Handwritten Digit Strings Using Modular.. - Shastri, Fontaine (1995)   (1 citation)  (Correct)

....(see Section 3) 2.2 Spatio temporal networks Processing a spatio temporal signal requires a model capable of processing time varying signals. A number of researchers have proposed network models to represent and process such signals (e.g. Elman, 1990; Jordon, 1987; Lapedes and Farber, 1987; Mozer, 1989; Waibel et al. 1989; Watrous and Shastri, 1986) The connectionist model we employed was inspired by the Temporal Flow Model (TFM) which has achieved good results in speech recognition (Watrous, 1990; Watrous, 1991) TFM supports arbitrary link connectivity across layers, admits feedforward as ....

Mozer, M. (1989). A focused backpropagation algorithm for temporal pattern recognition. Complex Systems, 3:349--381.


Representation of Discrete States - Giles, Omlin (1999)   (Correct)

....between input symbols and input neurons. In this class of recurrent network architectures, the recurrent connections are restricted to self loops. One advantage of these locally recurrent networks is that they can be trained with gradient descent algorithms which are local in both space and time [1, 17, 23]. 10 Figure 6: Locally Recurrent Networks: They consist of an input layer, a layer of dynamic neurons with self recurrent connections whose outputs are the inputs to a standard layered, feedforward neural networks. The time delays on the feedback connections are shown as squares. We use the ....

M. Mozer, "A focused backpropagation algorithm for temporal pattern recognition," Complex Systems, vol. 3, 1989.


Modularization by Cascading Neural Networks - Littmann, Ritter   (Correct)

....approach (Minsky and Papert, 1969; Rumelhart et al. 1986) This will be a subject of further research. Another, different way to deal with temporal patterns will follow the recurrent cascade correlation approach (Fahlman, 1991) providing the cascade units with adaptive recurrent self connections (Mozer, 1989). 4.2 Preprocessing Any mapping of a feature vector can be interpreted as a preprocessing if it is followed by further operations. The cascade architectures can be regarded as methods to incrementally build a powerful preprocessing device. The preprocessing capability of the single component ....

Mozer, M. (1989). A focused back-propagation algorithm for temporal pattern recognition. Complex Systems, 3:349--381.


The Acquisition of Lexical Semantics for Spatial Terms: A.. - Regier (1992)   (18 citations)  (Correct)

....network through time, this approach is quite expensive in terms of memory consumption. Another drawback, as we mentioned earlier in Chapter 3, is the fact that tall networks with tied weights between layers, such as would result when trying to learn long sequences, tend to be difficult to train [Mozer, 1988; Pollack, 1990a] This is a point against the method of back propagation through time in general. Since we assume here that movies may be of arbitrary length, this is potentially a serious problem with this method. Time Delay Neural Networks Time delay neural networks, as presented in Chapter 3, ....

Michael Mozer, "A Focused Back-Propagation Algorithm for Temporal Pattern Recognition," Technical Report CRG-TR-88-3, Connectionist Research Group, University of Toronto, 1988.


Gradient-Based Learning Algorithms for Recurrent Networks.. - Williams, Zipser (1995)   (56 citations)  (Correct)

....examples of input and desired output trajectories. One example of such a task is sequence classification, where the input is the sequence to be classified and the desired output is the correct classification, which is to be produced at the end of the sequence, as in some of the work reported by Mozer (1989; chapter , this volume] Another example is sequence production, as studied by Jordan (1986) in which the input is a constant pattern and the corresponding desired output is a timevarying sequence. More generally, both the input and desired output may be time varying, as in the prediction ....

....than fixed, they can form delay line structures when necessary while also being able to create flip flops or other memory structures capable of preserving state over potentially unbounded periods of time. This point has been emphasized in (Williams, 1990) and similar arguments have been made by Mozer (1989; chapter , this volume] There are a number of possible reasons to pursue the development of learning algorithms for recurrent networks, and these may involve a variety of possible constraints on the algorithms one might be willing to consider. For example, one might be interested in ....

[Article contains additional citation context not shown here]

Mozer, M. C. (1989). A focused back-propagation algorithm for temporal pattern recognition.


A Learning Algorithm for Continually Running Fully Recurrent .. - Williams, Zipser (1989)   (263 citations)  (Correct)

....strengths is its generality, but a corresponding weakness is its growing memory requirement when given an arbitrarily long training sequence. Other approaches to training recurrent nets to handle time varying input or output have been suggested or investigated by Jordan (1986) Bachrach (1988) Mozer (1988), Elman (1988) ServanSchreiber, Cleeremans, and McClelland (1988) Robinson and Fallside (1987) Stornetta, Hogg, and Huberman (1987) Gallant and King (1988) and Pearlmutter (1988) Many of these approaches use restricted architectures or are based on more computationally limited approximations ....

....while not suffering from its growing memory requirement in arbitrarily long training sequences. It coincides with an approach suggested in the system identification literature (McBride Narendra, 1965) for tuning the parameters of general dynamical systems. The work of Bachrach (1988) and Mozer (1988) represents special cases of the algorithm presented here, and Robinson and Fallside (1987) have given an alternative description of the full algorithm as well. However, to the best of our knowledge, none of these investigators has published an account of the behavior of this algorithm in ....

Mozer, M. C. (1988). A focused back-propagation algorithm for temporal pattern recognition (Tech.


ASCOC: A Locally Recurrent Neural Network Model for.. - Dit-Yan Yeung (1994)   (Correct)

....to as locally recurrent neural networks (LRNNs) here, are primarily feedforward networks, with the exception that feedback connections exist between some limited sets or layers of units. Up to now, almost all RNN models proposed for formal language learning are LRNNs, including both first order [3, 1, 4, 7] and second order [5, 2, 8, 12] networks. LRNNs are easier to analyze because of their limited feedback connections and layered structures. These networks have been demonstrated to be capable of solving some nontrivial formal language learning problems. In this paper, our focus is on using ....

M.C. Mozer, "A focused back-propagation algorithm for temporal pattern recognition," Complex Systems, vol. 3, pp. 349--381, 1989.


Recurrent Neural Networks for Temporal Sequences Recognition - Huet (1993)   (Correct)

....time delay neural networks. The NET talk program [18] is a good example of a connectionist system using a buffer (or temporal window) in order to transform a temporal problem into its spatial representation. x(t) x(t 1) x(t 2) x(t 3) x(t 4) Figure 2.2: A Delay Line network architecture. Mozer [13] and others [19] have exposed the different drawbacks of these conventional methods. The first and obvious drawback is that the buffer must have a sufficient capacity to hold the longest possible input sequence. Similarly, a buffer of t element may be used to recognize an input pattern of greater ....

....sequence as being the same. From these facts, one can easily notice that spatial transformation of time sequences is not viable. A more flexible representation is needed. Despite of the drawbacks stated above, the following properties required by any model of temporal pattern recognition emerge [13]: ffl Some memory of the input history is required. This memory will retain information about the past inputs that are relevant for the future recognition or prediction. ffl A function must be specified to combine the current memory and the current input to form a new temporal context (memory) ....

[Article contains additional citation context not shown here]

M.C. Mozer. "A Focused Back-propagation Algorithm for Temporal Pattern Recognition". Complex Systems, vol 3:pp 349--381, 1989.


Gradient-Based Learning Algorithms for Recurrent.. - Williams, Zipser (1990)   (17 citations)  (Correct)

....an alternative approach is to propagate activity gradient information forward. This leads to a learning algorithm which we have called real time recurrent learning (RTRL) This algorithm has been independently derived in various forms by Robinson and Fallside (1987) Kuhn (1987) Bachrach (1988) Mozer (1988), and Williams and Zipser (1989a) and continuous time versions have been proposed by Gherrity (1989) and by Doya and Yoshizawa (1989) 5.1 The Algorithm For each k 2 U , i 2 U , j 2 U [ I, and t 0 t t 1 , we define p k ij (t) y k (t) w ij : 28) This quantity measures the sensitivity ....

....k ij (t) f 0 k (s k (t) 2 4 X l2UH w kl p l ij (t Gamma 1) ffi ik x j (t Gamma 1) 3 5 ; 39) which are just the RTRL equations (30) specialized to take into account the fact that w kl is 0 if l 2 UO . One noteworthy special case of this type of architecture has been investigated by Mozer (1988). For this architecture, the only connections allowed between units in the hidden stage are selfrecurrent connections. In this case, p k ij is 0 except when k = i. This algorithm can then be implemented in an entirely local fashion by regarding each p i ij value as being stored with w ij , ....

[Article contains additional citation context not shown here]

Mozer, M. C. (1988). A focused back-propagation algorithm for temporal pattern recognition (Technical Report). Toronto: University of Toronto, Departments of Psychology and Computer Science.


Learning Long-Term Dependencies with Gradient Descent is.. - Bengio, Simard, Frasconi (1994)   (105 citations)  (Correct)

....such as the forward propagation algorithms [14, 23] are much more computationally expensive (for a fully connected recurrent network) but are local in time, i.e. they can be applied in an on line fashion, producing a partial gradient after each time step. Another algorithm was proposed [10, 18] for training constrained recurrent networks in which dynamic neurons with a single feedback to themselves have only incoming connections from the input layer. It is local in time like the forward propagation algorithms and it requires computation only proportional to the number of ....

Mozer M.C. "A focused back-propagation algorithm for temporal pattern recognition", Complex Systems, 3, 1989, pp. 349-391.


On the Problem of Local Minima in Recurrent Neural Networks - Bianchini, Gori, Maggini (1994)   (6 citations)  (Correct)

....devise efficient methods for computing the gradient. For the case of fully recurrent networks, some very interesting algorithms have been proposed in [19, 23, 30, 32, 33] while more efficient but restrictive algorithms have been devised for the case of networks having self loop connections only [14, 21, 29]. A potential problem, which is likely to affect practical applications, is that the learning process may be seriously plagued by the presence of local minima in the cost function. In general, there is no reason to exclude the presence of stationary points that may also be local minima. Obviously, ....

M. C. Mozer, "A focused backpropagation algorithm for temporal pattern recognition," Complex Systems, vol. 3, pp 349-381, 1989.


Adaptive Look-Ahead Planning - Sebastian Thrun, Knut Möller.. (1990)   (1 citation)  (Correct)

....initial plans. The Feed Forward Algorithm for Gradient Search in Action Space As mentioned above the environment is modeled by a non recurrent multilayer backpropagation network. This restriction is sufficient for our simulation results the extension of the algorithm to recurrent networks [3, 4, 5, 9, 13, 14, 15, 21] is straightforward and shown in [18] The external input of the world model network is a state vector s(t) and an action vector a(t) Both state and action vector are the external input I(t) of the model network; for all non input units this external input is 0. The output of the network ....

M. C. Mozer. A focused backpropagation algorithm for temporal pattern recognition. Technical Report CRGTR -88-3, Depts. of Psychology and Computer Science, University of Toronto, Toronto, Jun 1988.


The Recurrent Cascade-Correlation Architecture - Fahlman (1991)   (37 citations)  (Correct)

....is frozen, along with all the other weights. Each new hidden unit is in effect a single state variable in a finite state machine that is built specifically for the task at hand. In this use of self recurrent connections only, the RCC model resembles the Focused Back Propagation algorithm of Mozer[Mozer, 1988]. The output, V(t) of each self recurrent unit is computed as follows: V(t) oe X i I i (t) w i V(t Gamma 1) w s where oe is some non linear squashing function applied to the weighted sum of inputs I plus the self weight, w s , times the previous output. In the studies described ....

Mozer, M. C. (1988) "A Focused Back-Propagation Algorithm for Temporal Pattern Recognition," Tech Report CRG-TR-88-3, Univ. of Toronto, Dept. of Psychology and Computer Science.


Neural network music composition by prediction: Exploring the.. - Mozer (1994)   (7 citations)  Self-citation (Mozer)   (Correct)

....a minor third . Connectionist learning algorithms offer the potential of overcoming the various limitations of transition table approaches and Kohonen musical grammars. Connectionist algorithms are able to discover relevant structure and statistical regularities in sequences (e.g. Elman, 1990; Mozer, 1989). Indeed, connectionist algorithms can be viewed as an extension of the transition table approach, a point also noted by Dolson (1989) Just as the transition table approach uses a training set to calculate the probability of the next note in a sequence as a function of the previous notes, so does ....

.... network is trained on sequences in which one event predicts another, the relationship is not hard to learn if the two events are separated by only a few unrelated intervening events, but as the number of intervening events grows, a point is quickly reached where the relationship cannot be learned (Mozer, 1989, 1992, 1993; Schmidhuber, 1992) Bengio, Frasconi, and Simard (1993) present theoretical arguments for inherent limitations of learning in recurrent networks. This poses a serious limitation on the use of back propagation to induce musical structure in a note bynote prediction paradigm because ....

[Article contains additional citation context not shown here]

Mozer, M. C. (1989). A focused back-propagation algorithm for temporal pattern recognition. Complex Systems, 3, 349-381.


A General Feed-Forward Algorithm for Gradient Descent in.. - Thrun, Smieja (1990)   (2 citations)  (Correct)

No context found.

M. C. Mozer. A focused backpropagation algorithm for temporal pattern recognition. Technical Report CRG-TR-88-3, Depts. of Psychology and Computer Science, University of Toronto, Toronto, Jun 1988.


Inversion In Time - Thrun, Linden (1990)   (1 citation)  (Correct)

No context found.

M. C. Mozer, A Focused Back/Propagation Algorithm for Temporal Pattern Recognition. Technical Report CRG-TR-88-3, 1988


Modeling Applications with the Focused Gamma Net - Principe, de Vries, Kuo, de.. (1992)   (1 citation)  (Correct)

No context found.

Mozer M.C. (1989). A Focused Backpropagation Algorithm for Temporal Pattern Recognition. Complex Systems 3, 349-381.


Apprentissage Dans Les Réseaux Récurrents Pour La Modélisation.. - Szilas (1995)   (Correct)

No context found.

Michael C. Mozer. A focused back-propagation algorithm for temporal pattern recognition, Rapport Technique, Univ. of Toronto, Dep ts Psychology and Computer Science, CRG-TR-88-3, 1988. 255


Learning to Forget: Continual Prediction with LSTM - Felix A. Gers, Jürgen.. (1999)   (4 citations)  (Correct)

No context found.

M. C. Mozer. A focused backpropagation algorithm for temporal pattern processing. Complex Systems, 3:349--381, 1989.


ASCOC: A Recurrent Neural Network Model for Grammatical Inference - Yeung, Yeung (1994)   (Correct)

No context found.

Mozer, M.C. (1989). A focused back-propagation algorithm for temporal pattern recognition. Complex Systems, 3, 349--381.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC