23 citations found. Retrieving documents...
G. E. Hinton, "Learning Translation Invariant Recognition in Massively Parallel Networks," in Proceedings PARLE Conference on Parallel Architectures and Languages Europe, A. J. Nijman J.W. de Bakker and P. C. Treleaven, Eds., Berlin, 1987, pp. 1--13, Springer-Verlag.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Hints - Abu-Mostafa (1995)   (Correct)

....or examples that do not depict some aspects of the target function) Like regularization, hints would not be needed if an unlimited supply of proper examples (and unlimited computation) could be secured. There are different types of hints that are common to many applications. Invariance hints [12,16,19,23] are the most common type in pattern recognition applications. Such a hint asserts that the target function is invariant under certain transformations of the input. Monotonicity hints [4] are common in expert system applications such as credit rating and medical diagnosis. The two major steps ....

G. Hinton, "Learning translation invariant recognition in a massively parallel network," Proc. Conf. Parallel Architectures and Languages Europe, pp. 1-13, 1987.


Towards Automatic Registration of Magnetic Resonance Images of.. - Sabisch (1998)   (Correct)

....to reduce the complexity of the recognition task. There are at least three different approaches to improve the tolerance of a system towards transformations [Barnard and Casasent, 1989,Bishop, 1995] ffl Learning the invariance. The simplest way is to learn the transformations from examples [Hinton, 1987]. For instance, if translational invariance is desired each object is presented at many different positions. Typically, this results in a significant enlargement of the data set and online training time. Hence, it is a very inefficient method. The final system is also only approximately invariant. ....

Hinton, G. E. (1987). Learning translation invariant recognition in a massively parallel network. In Proceedings of Parallel Architectures and Languages Europe (PARLE), pages 1--13.


Staged Training of Neocognitron by Evolutionary Algorithms - Pan, Sabisch, Adams, Bolouri (1965)   (1 citation)  (Correct)

....to propose a number of methods. These approaches to shift and deformation tolerance fall into three categories, learning the transformations, extracting invariant features and building a specific architecture. Roughly speaking, the first approach is to learn the object models from examples [Hinton, 1987] and try to match the observed and stored models by determination of the transformation parameters [Chin and Dyer, 1986] For example, each object will be presented at many different positions if translational invariance is desired. Typically, such methods require a carefully prepared, and very ....

Hinton, G. E. (1987). Learning translation invariant recognition in a massively parallel network. In Proceedings of Parallel Architectures and Languages Europe (PARLE), pages 1--13.


A Spike Based Learning Rule for Generation of Invariant.. - Körding, al. (2000)   (Correct)

....following principles of supervised or unsupervised learning. The former need labelled training data, and actually for large networks, a lot of these. With such data at hand, it is possible to learn invariant recognition training the network with a variant of the backpropagation algorithm (e.g. [20]) To alleviate the problem of getting enough training data, applying the desired invariance operators to individual examples can enlarge the training set. However, then we are back with an a priori specification of the invariance operation. Furthermore, these approaches do not supply a convincing ....

Hinton G.E., Learning translation invariant recognition in a massively parallel network, in: Goos G., Hartmanis J. (Eds.), PARLE: Parallel Architectures and Languages Europe, Lecture Notes in Computer Science, Springer-Verlag, Berlin, 1987, pp. 1 -- 13.


Robust Full Bayesian Learning for Radial Basis Networks - Andrieu, de Freitas, Doucet (2000)   (4 citations)  (Correct)

.... terms include the well known Akaike information criterion (AIC) Bayesian information criterion (BIC) and minimum description length (MDL) Akaike 1974, Schwarz 1985, Rissanen 1987) Penalised likelihood has also been used extensively to impose smoothing constraints either via weight decay priors (Hinton 1987, Mackay 1992) or functional regularisers that penalise for high frequency signal components (Girosi, Jones and Poggio 1995) In the predictive assessment approach, the data is split into a training set, a validation set and possibly a test set. The key idea is to balance the bias and variance of ....

Hinton, G. (1987). Learning translation invariant recognition in massively parallel networks, in J. W. de Bakker, A. J. Nijman and P. C. Treleaven (eds), Proceedings of the Conference on Parallel Architectures and Languages, Berlin, pp. 1-13.


Generativity and Systematicity in Neural Network Combinatorial.. - Brousse (1993)   (8 citations)  (Correct)

....perceptron individually. And indeed, even current models using more powerful learning techniques than the perceptron suggest that in order to obtain correct performance on a target set of inputs, a network needs to be trained on a sizable fraction (between 25 and 75 ) of the learning set. e.g. (Hinton, 1987), Saito and Nakano, 1988) Le Cun et al. 1990) In light of this example, then, and abstracting over many other tasks tackled both by the symbolic and the sub symbolic approaches, we believe that question (1) has not been directly and experimentally addressed. Our approach to addressing ....

G. Hinton. Learning translation invariant recognition in a massivelly parallel network. In Proceedings of the European conference on parallel architectures and languages. Springer, 1987.


Benefits of Gain: Speeded learning and minimal hidden.. - Kruschke, Movellan (1991)   (14 citations)  (Correct)

....the gain parameter that makes it an attractive device for introducing constraints into back propagation. Improved generalization from small hidden layers. There is empirical evidence that generalization to novel input patterns is improved by using hidden layers with a small number of nodes (e.g. [18 20]) In these cases, generalization from the training set to novel inputs was better when the number of hidden nodes was relatively small. 2 The reason for improved generalization is intuitively clear: A small hidden layer forces the input patterns to be mapped through a low dimensional space, ....

....and defines s ij to be 1 if i = j and 0 otherwise. Then the cost function reduces to C = P i g 2 i , and gradient descent on C with respect to g i yields 1g i 0g i . That is simply gain decay, analogous to weight decay, which has often been used in applications of back propagation (e.g. [18,27]) Another special case defines h(g i ) g i = 1 g i ) with s ij again equal to 1 if i = j and 0 otherwise, so that C = P i g 2 i = 1 g i ) 2 . This is also a form of gain decay, in which the decay rate increases to a maximum for some small value of gain, and then decreases as gain ....

[Article contains additional citation context not shown here]

G. Hinton. "Learning translation invariant recognition in a massively parallel network. " In: G. Goos and J. Hartmanis (eds.), Lecture Notes in Computer Science, Volume I: Parallel Architectures and Languages Europe. New York: Springer-Verlag, 1987.


Data Regularization - Jankowski (2000)   (Correct)

....or by overlapping clusters. Some models try to solve this problem using several kinds of regularization methods during the learning process. Most regularization methods add a penalty term to the error function, for example regularization proposed by Poggio and Girosi [3] Hinton s weight decay [2] and weight elimination proposed by Weigend [ 4] However, even using regularization methods problems mentioned above do not vanish. One of the reasons is that regularization methods have (almost) the same sensitivity in whole input space. Moreover, even experts in most cases are not able to check ....

G. E. Hinton. Learning translation invariant recognition in massively parallel networks. In J. W. de Bakker, A. J. Nijman, and P. C. Treleaven, editors, Proceedings PARLE Conference on Parallel Architectures and Languages Europe, pages 1--13, Berlin, 1987. Springer-Verlag.


Pénalisation Multiple Adaptative - Boukari, Grandvalet   (Correct)

....peut etre estim ee analytiquement ou en utilisant les techniques de r e echantillonnage. Soit w 2 IR d , le vecteur form e des param etres du mod ele. Les deux approches de p enalisation les plus utilis ees sont : 1. les moindres carr es r egularis es ( ridge regression [7] ou weight decay [6] dans la litt erature neuronale) La complexit e du mod ele est ici control ee par la norme euclidienne des param etres. Le crit ere minimis e est : C 0 emp (f) Cemp (f) i=d X i=1 w 2 i (4) 2. la s election de param etres, qui p enalise le nombre de param etres non nuls. Cette ....

Hinton, G.E. Learning translation invariant recognition in massively parallel networks. In PARLE Conference on Parallel Architectures and Languages Europe, 'edit'e par de Bakker, J.W. Nijman, A.J. et Treleaven, pp. 1--13, Springer, 1987.


Staged Training of Neocognitron by Evolutionary Algorithms - Pan, Sabisch, Admas, Bolouri (1999)   (1 citation)  (Correct)

....to propose a number of methods. These approaches to shift and deformation tolerance fall into three categories, learning the transformations, extracting invariant features and building a speci c architecture. Roughly speaking, the rst approach is to learn the object models from examples [Hinton, 1987] and try to match the observed and stored models by determination of the transformation parameters [Chin and Dyer, 1986] For example, each object will be presented at many di erent positions if translational invariance is desired. Typically, such methods require a carefully prepared, and very ....

Hinton, G. E. (1987). Learning translation invariant recognition in a massively parallel network. In Proceedings of Parallel Architectures and Languages Europe (PARLE), pages 1-13.


Generalization by Neural Networks - Shekhar, Amin (1992)   (7 citations)  (Correct)

....O 1 ) I 2 , O 2 ) I n , O n ) In recognition problems, the trained network is tested with a previously seen input I j (1 j n) corrupted by noise as shown in Fig.1. The trained network is expected to reproduce the output O j corresponding to I j , in spite of the noise. Shape recognition [9, 10], and handwriting recognition[2] are examples of recognition problems. On the other hand, in generalization problems, the trained neural network is tested with input I n 1 , which is distinct from the inputs I 1 , I 2 , I n used for training the network, as shown in Fig.1. The network is ....

G.E. Hinton, Learning Translation Invariant Recognition in Massively Parallel Network, Lecture Notes in Computer Science # 258, pp. 1-13 Springer Verlag, (1987).


Back-Propagation Without Weight Transport - Kolen, Pollack (1994)   (Correct)

.... one What developmental mechanism guarantees equality between two corresponding weights In this paper, we first show that it is possible to relax the equality restriction and consider only the asymtopic case, i.e. for all , We show that besides its other uses (Hinton Sejnowski, 1986; Hinton, 1987; Krogh and Hertz, 1992; MacKay, 1992; Moody, 1992) weight decay can synchronize each weight pair, as outlined in Figure 2. Weight decay is defined simply as , where is the decay constant. In physical systems, such as the brain or VLSI chips, maintaining stable analog values requires elaborate ....

Hinton, G. E. (1987). Learning translation invariant recognition in a massively parallel network. In G. Goos and J. Hartmanis (Eds.) PARLE: Parallel Architectures and Languages Europe. Lecture Notes in Computer Science. Berlin:Springer-Verlag.


Neural Networks in Economics: Background.. - Herbrich.. (1999)   (Correct)

....small VC dimension. Another way to incorporate this into the learning process is to the minimize Remp (ff) kffk 2 where has to be chosen beforehand. Such a technique is also called regularization (Poggio and Girosi, 1990) and was successfully used in the weight decay learning algorithm (Hinton, 1987). The Support Vector algorithm to be presented in Section 5 makes use of a similar technique. 4. Economic Applications of Neural Networks An Overview of the Literature With the application of backpropagation to Neural Network learning (Rumelhart et al. 1986) and the revived interest into ....

Hinton, G.: 1987, `Learning translation invariant recognition in massively parallel networks'. In: Proceedings Conference on Parallel Architectures and Laguages Europe. pp. 1--13.


Neural Networks As A Model For Visual Perception: What Is Lacking? - Würtz (1999)   (Correct)

....new patterns at a single position and then recognize them in an invariant way in any position. Again, I would allow a small number of positions. It is granted that the network is now a structured one, but the structure has been learned from examples. A good step into this direction is a network by Hinton (1987), which actually achieves the desired generalization. His parameters are N=16, P=12. Every pattern is trained at 10 (random) positions. So the number of training examples is 0.83 P N, the number of test examples to which the network generalizes is 0.17 P N. This gets a little awkward for larger ....

....larger values of N and P. The task as outlined above would allow only S (P N 1) training examples. Something like S=3 should be appropriate, S=1 desirable. Then the network should generalize and recognize all P N examples correctly. Note that there is no objection to the choice of parameters in (Hinton, 1987) but to the scaling behavior for larger parameter values. The network must have seen all patterns in almost all possible positions to do the generalization. To the author s knowledge the goal of learning shift invariance from a training set of size S (P N 1) with a small S has not been reached ....

Hinton, G. (1987), Learning translation invariant recognition in massively parallel networks, In Goos, G. and Hartmanis, J. (Eds.), PARLE Parallel Architectures and Languages Europe, volume 258 of Lecture Notes in Computer Science, pages 1--13, Springer.


Robust Full Bayesian Learning for Neural Networks - Andrieu, de Freitas, Doucet (1999)   (4 citations)  (Correct)

.... terms include the well known Akaike information criterion (AIC) Bayesian information criterion (BIC) and minimum description length (MDL) Akaike 1974, Schwarz 1985, Rissanen 1987) Penalised likelihood has also been used extensively to impose smoothing constraints either via weight decay priors (Hinton 1987, Mackay 1992) or functional regularisers that penalise for high frequency signal components (Girosi, Jones and Poggio 1995) In the predictive assessment approach, the data is split into a training set, a validation set and possibly a test set. The key idea is to balance the bias and variance of ....

Hinton, G. (1987). Learning translation invariant recognition in massively parallel networks, in J. W. de Bakker, A. J. Nijman and P. C. Treleaven (eds), Proceedings of the Conference on Parallel Architectures and Languages Europe, Berlin, pp. 1--13.


Pruning Recurrent Neural Networks for Improved Generalization.. - Giles, Omlin (1994)   (12 citations)  (Correct)

....a simple technique for pruning trained recurrent neural networks to significantly improve their generalization performance. To our knowledge, no such technique for recurrent neural networks has been previously published. Good generalization results have also be reported using weight decay ([8, 10]) We will compare our pruning method with weight decay for different decay rates. Published in IEEE Trans. on Neural Networks vol. 5, no. 5, p. 848, 1994. Copyright IEEE. 2 PRUNING A RECURRENT NETWORK To test our pruning heuristic, we incrementally trained discrete time, fully recurrent ....

....size of the maximal training set; NN classification errors on test set; quantization level; size of extracted DFA; DFA classification errors. 3. 3 Comparison with Weight Decay It has been observed in simulations that weight decay can improve the generalization performance of feedforward networks ([8, 10]) Weight decay suppresses irrelevant components of weight vectors by choosing a small vector that solves the learning problem. For networks trained using weight decay, the error function is expanded to include an error term which penalizes large weights: The weight update then becomes Deltaw ....

G. E. Hinton, "Learning translation invariant recognition in a massively parallel network," in PARLE: Parallel Architectures and Languages Europe, pp. 1--13, Berlin: Springer Verlag, 1987. Lecture Notes in Computer Science.


Neural Networks As A Model For Visual Perception: What Is Lacking? - Würtz (1998)   (Correct)

....new patterns at a single position and then recognize them in an invariant way in any position. Again, I would allow a small number of positions. It is granted that the network is now a structured one, but the structure has been learned from examples. A good step into this direction is a network by Hinton (1987), which actually achieves the desired generalization. His parameters are N=16, P=12. Every pattern is trained at 10 (random) positions. So the number of training examples is 0.83 DeltaP DeltaN, the number of test examples to which the network generalizes is 0.17 DeltaP DeltaN. This gets a little ....

....values of N and P. The task as outlined above would allow only S Delta(P N 1) training examples. Something like S=3 should be appropriate, S=1 desirable. Then the network should generalize and recognize all P DeltaN examples correctly. Note that there is no objection to the choice of parameters in (Hinton, 1987) but to the scaling behavior for larger parameter values. The network must have seen all patterns in almost all possible positions to do the generalization. To the author s knowledge the goal of learning shift invariance from a training set of size S Theta(P N) with a small S has not been reached ....

Hinton, G. (1987), Learning translation invariant recognition in massively parallel networks, In Goos, G. and Hartmanis, J., editors, PARLE Parallel Architectures and Languages Europe, number 258 in Lecture Notes in Computer Science, pages 1--13, Springer.


A Simple Weight Decay Can Improve Generalization - Krogh (1992)   (54 citations)  (Correct)

....lead to an exponential decay of the weights. Obviously there are infinitely many possibilities for choosing other forms of the additional term in (1) but here we will concentrate on this simple form. It has been known for a long time that a weight decay of this form can improve generalization [7], but until now not very widely recognized. The aim of this paper is to analyze this effect both theoretically and experimentally. Weight decay as a special kind of regularization is also discussed in [8, 9] 2 FEED FORWARD NETWORKS A feed forward neural network implements a function of the ....

G.E. Hinton. Learning translation invariant recognition in a massively parallel network. In G. Goos and J. Hartmanis, editors, PARLE: Parallel Architectures and Languages Europe. Lecture Notes in Computer Science, pages 1--13, Springer-Verlag, Berlin, 1987.


Neural Networks in Economics: Background.. - Herbrich, Keilbach, ..   (Correct)

....with small VC dimension. Another way to incorporate this into the learning process is to the minimize Remp (ff) kffk 2 where has to be chosen beforehand. Such a technique is also called regularization (Poggio and Girosi 1990) and was successfully used in the weight decay learning algorithm (Hinton 1987). The Support Vector algorithm to be presented in Section 5 makes use of a similar technique. 4 Economic Applications of Neural Networks An Overview of the Literature With the application of backpropagation to Neural Network learning (Rumelhart et al. 1986) and the revived interest into Neural ....

Hinton, G. (1987). Learning translation invariant recognition in massively parallel networks. In Proceedings Conference on Parallel Architectures and Laguages Europe, pp. 1--13. Springer Verlag.


Keeping Neural Networks Simple by Minimizing the Description.. - Hinton (1993)   (63 citations)  Self-citation (Hinton)   (Correct)

....data misfits and the weights by minimizing the sum of two terms: C = X j 1 2oe 2 j X c (d c j Gamma y c j ) 2 1 2oe 2 w X ij w 2 ij (4) where c is an index over training cases. This is just the standard weight decay method. The fact that weight decay improves generalization (Hinton, 1987) can therefore be viewed as a vindication of this crude MDL approach in which the standard deviations of the gaussians used for coding the data misfits and the weights are both fixed in advance. 2 An elaboration of standard weight decay is to assume that the distribution of weights in the trained ....

Hinton, G. E. (1987) Learning translation invariant recognition in a massively parallel network. In Goos, G.


Studies of Model Selection and Regularization for Generalization in .. - Guo   (Correct)

No context found.

G. E. Hinton, "Learning Translation Invariant Recognition in Massively Parallel Networks," in Proceedings PARLE Conference on Parallel Architectures and Languages Europe, A. J. Nijman J.W. de Bakker and P. C. Treleaven, Eds., Berlin, 1987, pp. 1--13, Springer-Verlag.


Brain-Structured Connectionist Networks That Perceive And Learn - Honavar, Uhr (1989)   (2 citations)  (Correct)

No context found.

Hinton, G. E., "Learning translation-invariant recognition in a massively parallel network," in PARLE: Parallel Architectures and Languages, Europe. Lecture Notes in Computer Science, ed. G. Goos and J. Hartmanis, Springer-Verlag , Berlin , 1987b.


A Bibliography of the Intersection of Genetic Search and.. - Rudnick (1990)   (13 citations)  (Correct)

No context found.

G. Hinton (1987). "Learning translation invariant recognition in a massively parallel networks," PARLE: Parallel Architectures and Languages Europe, vol. 258, pp. 113, Springer-Verlag.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC