Results 1 
3 of
3
Architectural Bias in Recurrent Neural Networks  Fractal Analysis
 IEEE TRANSACTIONS ON NEURAL NETWORKS
"... We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoidtype activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machin ..."
Abstract

Cited by 46 (9 self)
 Add to MetaCart
(Show Context)
We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoidtype activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002; Tino, Cernansky & Benuskova, 2002; Tino, Cernansky & Benuskova, 2002a). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this paper we further extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finitestate transition diagram  a scenario that has been frequently considered in the past e.g. when studying RNNbased learning and implementation of regular grammars and finitestate transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as boxcounting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.
Dynamics and topographic organization in recursive selforganizing map
 NEURAL COMPUTATION
, 2006
"... Recently, there has been an outburst of interest in extending topographic maps of vectorial data to more general data structures, such as sequences or trees. However, at present, there is no general consensus as to how best to process sequences using topographic maps and this topic remains a very a ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Recently, there has been an outburst of interest in extending topographic maps of vectorial data to more general data structures, such as sequences or trees. However, at present, there is no general consensus as to how best to process sequences using topographic maps and this topic remains a very active focus of current neurocomputational research. The representational capabilities and internal representations of the models are not well understood. We rigorously analyze a generalization of the SelfOrganizing Map (SOM) for processing sequential data, Recursive SOM (RecSOM) (Voegtlin, 2002), as a nonautonomous dynamical system consisting of a set of fixed input maps. We argue that contractive fixed input maps are likely to produce Markovian organizations of receptive fields on the RecSOM map. We derive bounds on parameter β (weighting the importance of importing past information when processing sequences) under which contractiveness of the fixed input maps is guaranteed. Some generalizations of SOM contain a dynamic module responsible for processing temporal contexts as an integral part of the model. We show that Markovian topographic maps of sequential data can be produced using a simple fixed (nonadaptable) dynamic module externally feeding a standard topographic model designed to process static vectorial data of fixed dimensionality (e.g. SOM). However, by allowing trainable feedback connections one can obtain Markovian maps with superior memory depth and topography preservation. We elaborate upon the importance of nonMarkovian organizations in topographic maps of 2sequential data.
Building predictive models on complex symbolic sequences via a firstorder recurrent BCM network with lateral inhibition
"... . We use a recurrent version of the Bienenstock, Cooper and Munro network (RBCMN) with lateral inhibition [2] to map histories of symbols into activations in the recurrent layer. After training the networks, we construct finitecontext predictive models on top of recurrent activations by grouping cl ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
. We use a recurrent version of the Bienenstock, Cooper and Munro network (RBCMN) with lateral inhibition [2] to map histories of symbols into activations in the recurrent layer. After training the networks, we construct finitecontext predictive models on top of recurrent activations by grouping close activation patterns via a vector quantization. Predictive models extracted from RBCMNs are compared with those extracted from "classical" recurrent networks (RNN) trained in a supervised regime to perform the nextsymbol prediction. As a test bed we use two complex symbolic sequences with rather deep memory structures. Surprisingly, the BCMbased model (RBCMNs are trained in an unsupervised mode) has a comparable performance to its RNNbased counterpart. This can be explained by the familiar information latching problem in recurrent networks when longer time spans are to be latched [3, 4]. We argue that BCMbased models correspond to variable memory length Markov models. 1 Introduction ...