37 citations found. Retrieving documents...
M. Gori and A. Tesi, "On the problem of local minima in backpropagation," IEEE Transactions on PAMI, vol. 14, no. 1, pp. 76--86, 1992.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Evolutionary Discovery of Learning Rules for Feedforward Neural .. - Radi, Poli (1999)   (Correct)

....SBP algorithm [18, 21, 22] Rprop stands for Resilient backpropagation . It is a local adaptive learning scheme, performing supervised batch learning in multilayer perceptrons. For a detailed discussion see [18] Although Rprop is considerably faster than SBP, it still suffers from many problems [3] and, like SBP, it cannot train networks with step activation function. 3 PREVIOUS WORK A considerable amount of work has been done on the evolution of the weights and or the topology of neural networks. See for example [8, 15] However only a relatively small amount of work has been reported on ....

M. Gori and A. Tesi. On the problem of local minima in backpropagation. IEEE Transactions on PAMI, 14(1):76--86, 1992.


Neural Recognition in a Pyramidal Structure - Virginio Cantoni And   (Correct)

.... is based on a hierarchical system that executes a multiresolution analysis of the images provided as input and classifies through a structured neural network [12] A peculiar feature of this kind of networks is their capabilities to avoid local minima in reaching the optimal weight configuration [16], 17] In particular, it can be demonstrated in [16] that in case of linearly separable patterns, for a network with just one hidden layer, a number of outputs equal to the number of classes coded with exclusive coding, and full connections from the input to the hidden layer, subdivided into ....

.... analysis of the images provided as input and classifies through a structured neural network [12] A peculiar feature of this kind of networks is their capabilities to avoid local minima in reaching the optimal weight configuration [16] 17] In particular, it can be demonstrated in [16] that in case of linearly separable patterns, for a network with just one hidden layer, a number of outputs equal to the number of classes coded with exclusive coding, and full connections from the input to the hidden layer, subdivided into sublayers corresponding to the number of classes, the ....

M. Gori and A. Tesi, "On the problem of local minima in backpropagation, " IEEE Trans. Pattern Anal. Machine Intell., vol. 14, pp. 76--86, 1992.


Layered Neural Network training with Model Switching and.. - Kameyama, Taga   (Correct)

....efficiency of the training. Selection of the model is selecting the mapping ability of the possible set of functions, which determines the trainability of the network to a particular problem. It also affects the possibility of the training to be caught in the local minima of the error landscape [5]. Second, it can also affect the generalization ability of the pattern recognition machine, namely a desired ability to correctly classify an unseen input. In order to solve the issues pointed above, the authors have jointly used the conventional stepwise training methods and occasional model ....

M. Gori and A. Tesi, "On the problem of local minima in backpropagation," IEEE Transaction on Pattern Analysis and Machine Intelligence,vol. 14, no. 1, pp. 76--86, 1992. [Online]. Available: citeseer.nj.nec.com/gori92problem.html


Supervised Training Using Global Search Methods - Plagianakos, Magoulas (2001)   (Correct)

....in Table 1.1. Training XOR Problem Parity Problem Method Succ. Mean s.d. Succ. Mean s.d. BP 42 144.1 112.6 91 93 13 SA 43 424.2 420.8 22 805.4 2103 BPSA 65 1661.9 2775.7 66 263 6866.8 GA 95 422.3 32.3 73 1091.5 766.2 DE1 75 192.9 124.7 91 622.6 522.1 DE2 80 284.9 216.2 61 1994.1 657.6 DE3 97 583 256.3 99 896.3 450.6 DE4 98 706.1 36.1 98 1060.2 716.6 DE5 85 3 250.2 26 2112.0 644.9 DE6 93 482.9 264.9 44 2062.5 794.8 BPD 100 575.1 35.1 100 760.0 696.4 Table 1.1 Comparativeresults The results of Table 1.1 suggest that combination of local and global ....

....Problem Method Succ. Mean s.d. Succ. Mean s.d. BP 42 144.1 112.6 91 93 13 SA 43 424.2 420.8 22 805.4 2103 BPSA 65 1661.9 2775.7 66 263 6866.8 GA 95 422.3 32.3 73 1091.5 766.2 DE1 75 192.9 124.7 91 622.6 522.1 DE2 80 284.9 216.2 61 1994.1 657.6 DE3 97 583 256.3 99 896.3 450.6 DE4 98 706.1 36.1 98 1060.2 716.6 DE5 85 3 250.2 26 2112.0 644.9 DE6 93 482.9 264.9 44 2062.5 794.8 BPD 100 575.1 35.1 100 760.0 696.4 Table 1.1 Comparativeresults The results of Table 1.1 suggest that combination of local and global search methods like BPSA and BPD provide a ....

[Article contains additional citation context not shown here]

M. Gori and A. Tesi, "On the problem of local minima in backpropagation ", IEEE Trans. Pattern Analysis and Machine Intelligence, vol.14, 1992, 76--85.


Training Neural Nets with the Reactive Tabu Search - Battiff, Tecchiolli   (Correct)

....iterations (std.dev. 0.1 75 45014.7 (14631.7) 0.5 89 10360.7 (562.1) 1. 0 91 12530.8 (3802.4) 10.0 52 37426.9 (18268.2) 20.0 al 91603.2 (31993.3) tions, so that the above problems related to the initialization or to the presence of local minima should not discour age its usage (see also [23] for a discussion of cases where local minima are absent) A proper initialization is also critical to the success of batch backpropagation. Table 2 lists the performance results of batch BP with constant learning rate: 0.1 . TABLE II BATCH BACKPROPAGATION FOR THE XOR PROBLEM (MAx. 10 6 ....

M. Gori and A. Tesi, "On the Problem of Local Minima in Back- propagation," IEEE Transactions on PAMI, vol. 14, no. 1, pp. 76 86, 1992.


On-Line Learning Processes in Artificial Neural Networks - Heskes, Kappen (1993)   (10 citations)  (Correct)

....2, 4, and 5) We will use the bias E 1 (w) 1 4 2 X i=0 2 X j=0 h w 2 ij Gamma ff i 2 ; 44) with ff = 0:1 and = 0:01. Incorporation of this bias has a few advantages among which there are prevention of local minima with infinite weights and reduction of training times [39] After [18], we choose the set of p = 5 training patterns sketched in figure 9(b) Circles indicate negative output, crosses positive output. This is just the usual XOR truth table with one additional pattern at the origin. Because of this additional pattern, the error potential (43) has not only global ....

M. Gori and A. Tesi. On the problem of local minima in backpropagation. IEEE Transactions on PAMI, 14:76--86, 1992.


Genetic Programming Discovers Efficient Learning Rules for the.. - Radi, Poli (1998)   (Correct)

....(namely small) steps, in order to prevent the algorithm from getting stuck too quickly in local minima. For a detailed discussion see also [40, 41] Although the Momentum method and Rprop are considerably faster than SBP , they still suffer from same of the problems mentioned in Section 1 [11, 40]. 3 Previous Work on the Evolution of Neural Network Learning Rules A considerable amount of work has been done on the evolution of the weights and or the topology of neural networks. See for example [20, 35, 36, 37, 54] However only a relatively small amount of previous work has been reported ....

M. Gori and A. Tesi. On the problem of local minima in backpropagation. IEEETransactions on PAMI, 14(1):76#86, 1992.


Discovery of General Learning Rules for Feedforward Neural.. - Radi, Poli (1999)   (Correct)

....multilayer perceptrons. The basic principle of Rprop is to eliminate the harmful inuence of the size of the partial derivative E w l ij on the weight changes. For a detailed discussion see also [34, 35] Although Rprop is considerably faster than SBP , it still suffers from many problems [9, 34] and can not train networks with step activation function. 3 Previous Work on the Evolution of Neural Network Learning Rules A considerable amount of work has been done on the evolution of the weights and or the topology of neural networks. See for example [16, 28, 29, 30, 48] However only a ....

M. Gori and A. Tesi. On the problem of local minima in backpropagation. IEEE Transactions on PAMI, 14(1):76#86, 1992.


Genetic Programming Discovers Efficient Learning Rules for the.. - Radi, Poli (1998)   (Correct)

....(namely small) steps, in order to prevent the algorithm from getting stuck too quickly in local minima. For a detailed discussion see also [41, 42] Although the Momentum method and Rprop are considerably faster than SBP , they still suffer from same of the problems mentioned in Section 1 [11, 41]. 3 Previous Work on the Evolution of Neural Network Learning Rules A considerable amount of work has been done on the evolution of the weights and or the topology of neural networks. See for example [20, 36, 37, 38, 54] However only a relatively small amount of previous work has been reported ....

M. Gori and A. Tesi. On the problem of local minima in backpropagation. IEEE Transactions on PAMI, 14(1):76#86, 1992.


Evolutionary Discovery of Learning Rules for Feedforward.. - Amr Radi School (1999)   (Correct)

....SBP algorithm [18, 21, 22] Rprop stands for Resilient backpropagation . It is a local adaptive learning scheme, performing supervised batch learning in multilayer perceptrons. For a detailed discussion see [18] Although Rprop is considerably faster than SBP, it still suffers from many problems [3] and, like SBP, it cannot train networks with step activation function. 3 PREVIOUS WORK A considerable amount of work has been done on the evolution of the weights and or the topology of neural networks. See for example [8, 15] However only a relatively small amount of work has been reported on ....

M. Gori and A. Tesi. On the problem of local minima in backpropagation. IEEE Transactions on PAMI, 14(1):76--86, 1992.


Improving the Convergence of the Backpropagation Algorithm.. - Magoulas, al. (1999)   (3 citations)  (Correct)

....it applies the steepest descent (SD) method Neural Computation 11, 1769 1796 (1999) c # 1999 Massachusetts Institute of Technology 1770 G. D. Magoulas, M. N. Vrahatis, and G. S. Androulakis to update the weights, it suffers from a slow convergence rate and often yields suboptimal solutions (Gori Tesi, 1992). A variety of approaches adapted from numerical analysis have been applied in an attempt to use not only the gradient of the error function but also the second derivative in constructing efficient supervised training algorithms to accelerate the learning process. However, training algorithms ....

Gori, M., & Tesi, A. (1992). On the problem of local minima in backpropagation. IEEE Trans. Pattern Analysis and Machine Intelligence, 14, 76--85.


On the Uniqueness of Weights in Single Layer Perceptrons - Coetzee, Stonick (1995)   (Correct)

....of these results from a numerical perspective is evaluated on some carefully chosen examples. I INTRODUCTION Various examples of multiple local minima in single layer perceptrons have been published [8 10] The absence of local minima under certain separability or linear independence conditions [10,11], or modifications of the least squares error norm [12] have been proven. However, the problem of characterizing these extrema for the standard least squares error, using only a finite number of arbitrary inputs, previously has not been solved, nor has a clear understanding of the conditions which ....

M. Gori and A. Tesi, "On the problem of local minima in backpropagation," IEEE Trans. PAMI, vol. 14(1), pp. 76--85, Jan 1992.


Feature Set Evaluation and Robust Neural Networks.. - Sancho, Pierson..   (Correct)

....of remaining stuck in a local minimum. Moreover, for the particular case of the MLP trained by the BP algorithm, BM procedure solve the problem of weight intialization of the MLP because the convergence of the BP is insured at the initial situation where the training set is linearly separable, [23]. 3. EXPERIMENTAL RESULTS BM as FSE We believe BMs to be relevant to FSE because of a relationship between OS and . Since the theoretic relationship between the two values is not yet completed, empirical evidence is presented in this section to demonstrate proof of concept. In order to determine ....

Gori, M. and A. Tesi, "On the Problem of Local Minima in Backpropagation, " IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 1, pp. 76--86, 1992.


Critical Points for Least-Squares Problems Involving Certain.. - Sontag (1996)   (2 citations)  (Correct)

.... remarked that, even for extremely simple cases (such as K=1 and supposing that the inputs are binary vectors) there arise critical points associated to non global local minima, and thus the study of the set of critical points of E (u;y) has been frequently put forward as a research topic; see [3, 4, 8, 14, 17]. In this context it has also been observed many times that as with other least squares problems pathological behavior will depend heavily on the training sets not being in general position in appropriate senses of probability or topological density (cf. 4, 8, 14] In this paper, a ....

.... topic; see [3, 4, 8, 14, 17] In this context it has also been observed many times that as with other least squares problems pathological behavior will depend heavily on the training sets not being in general position in appropriate senses of probability or topological density (cf. [4, 8, 14]) In this paper, a combination of techniques from [1, 11, 16, 19] dealing with reconstruction of parameters from the functional form, the need for generic data (u; y) with large enough N , and the use of certain tools from analytic geometry and from model theory in logic is used in order to ....

Gori, M., and A. Tesi, "On the problem of local minima in back-propagation," Tech. Report RT-DSI 6/90, Univ. di Firenze, April 1990.


Training Neural Nets with the Reactive Tabu Search - Battiti, Tecchiolli (1995)   (14 citations)  (Correct)

....of iterations. It should be remembered that on line BP has been used e#ectively by developers on a series of significant applications, see the brief review in [24] so that the above problems related to the initialization or to the presence of local minima should not discourage its usage (see also [22] for a discussion of cases where local minima are absent) Available techniques to increase the safety requirements and the speed of convergence of BP, are for example the use of variable step lengths according to heuristic criteria, and the use of secondorder information, see [8] 4] and the ....

M. Gori and A. Tesi, "On the Problem of Local Minima in Backpropagation," IEEE Transactions on PAMI, vol. 14, no. 1, pp. 76--86, 1992.


A Classical Algorithm For Avoiding Local Minima - Gorse, Shepherd, Taylor (1994)   (4 citations)  (Correct)

....value of E, the local minima of the error weight surface. The most commonly used supervised training technique, error backpropagation (BP) equivalent to gradient descent with a fixed step length) is well known to have difficulties with local minima, especially for non linearly separable problems [1]. What is less well known is that the neural implementations of more efficient classical minimisation algorithms, such as conjugate gradients (CG) or the quasi Newton method (QN) are even more likely to be trapped in suboptimal solutions. Table 1 shows the percentage success in reaching a global ....

....step. In order to try to get some further insight into the process, we looked at the trajectories in output space followed for the XOR problem by the z p (l 1 = h) the first step responses to the four patterns p = 00, 01, 10, 11. Since the initial weights are randomly chosen (from the interval [ 1,1]) the trajectories in these experiments begin at some arbitrary point inside the hypercube [0,1] 4 . The target for h=1 is the point (0,1,1,0) the targets for h 1 lie on a line joining this point to ( 1 2, 1 2, 1 2, 1 2) By taking pairs of these responses we were able to plot ....

[Article contains additional citation context not shown here]

M Gori and A Tesi, "On the problem of local minima in backpropagation", IEEE Trans. on Pattern Analysis and Machine Intelligence, 14, 76-86 (1992).


On Fokker-Planck approximations of on-line learning processes - Heskes (1994)   (2 citations)  (Correct)

....2 Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma (b) Figure 5: a) Network structure. b) XOR problem with one additional pattern. Following reference [25], we choose the set of five training patterns sketched in figure 5(b) Circles indicate negative desired output x 3 = Gamma0:8, crosses positive output x 3 = 0:8. It is the usual XOR truth table with an additional pattern at the origin. Now the total error potential (E(w; x ) averaged ....

M. Gori and A. Tesi. On the problem of local minima in backpropagation. IEEE Transactions on PAMI, 14:76--86, 1992.


Analysis of the Error Surface of the XOR Network with Two Hidden.. - Hamey (1995)   (5 citations)  (Correct)

....classes. Their classes (b) d) occur only as points with infinite weight values but class (a) occurs for finite weight values. Hamey [9] proves that the exclusive or task does not have any regional local minima. Other analysis of the exclusive or network and related learning tasks may be found in [11, 12, 13, 14]. The present paper extends the results of [9] by considering finite relative local minima. A point w 0 is said to be a relative minimum of a function f(w) if there exists ffl 0 such that f(w 0 Deltaw) f(w 0 ) for all j Deltawj ffl. As discussed in [9] this definition is unsuitable for ....

M. Gori and A. Tesi, "On the problem of local minima in backpropagation," IEEE Trans. Patt. Anal. Machine Intell., vol. 14, pp. 76--85, 1992.


Graphical Item Recognition Using Neural Networks - Jianqing (1998)   Self-citation (Gori)   (Correct)

....fitting to training data. But the trained neural networks may suffer from the inadequate generalizations. So an appropriate hidden number should be properly chosen. The hidden layer we defined for logo recognition is based on a variety of experiments and some theoretical guidelines (see e.g. [80] [81] In this Chapter, noisy logo recognition is simulated in accordance with Baird image defect model and spot noise generation approach. The logo recognizability or un recognizability in the presence of spot noises should deserve specific consideration. With this respect, spot noises can ....

M. Gori and A.Tesi, "On the problem of local minima in backpropagation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 1, pp. 76--86, 1992.


Successes And Failures Of Backpropagation: A Theoretical.. - Frasconi, Gori, Tesi   (3 citations)  Self-citation (Gori Tesi)   (Correct)

....network. The analysis of these cases reveals that these minima are related to the squashing and cost functions. For example, 7 using the functions (for squash and cost) proposed by Rumelhart [44] local minima can arise when assuming non asymptotic targets (e,g. 0.1, 0. 9) As shown in [21], this kind of local minima disappear when choosing asymptotic values for the targets. Sontag and Sussman [48] have proposed an example of local minimum in a single layered network that does not involve the squash, nor the cost functions, but is related with the data assumed (no architectural ....

....by le Cun [37] who gives a heuristic criterion for setting up the initial weights. These studies, however, do not investigate the convergence during premature saturation. In [40] it is pointed out that these con gurations are related to the mismatching of at least one output target. Gori and Tesi [21] demonstrate that although these con gurations are associated with at surfaces, they do not represent local minima. Therefore, using enough numerical precision, a gradient descent learning algorithm is likely to escape, even though slowly, from these con gurations. This fact, which will be brie y ....

[Article contains additional citation context not shown here]

M. Gori, A. Tesi, \On the Problem of Local Minima in Backpropagation", IEEE Transactions on PAMI, Vol. 14, No. 1, January 1992, pp. 76-86.


Optimal Learning in Artificial Neural Networks: A Theoretical.. - Bianchini, Gori   (1 citation)  Self-citation (Gori)   (Correct)

....the true gradient descent . We focus mainly on batch mode by investigating conditions that guarantee local minima free error surface for both static and dynamic networks. In the case of feedforward networks, local minima free error surfaces are guaranteed when the patterns are linearly separable [7] or when using networks with as many hidden units as patterns to learn [8, 9] Analogous results hold for radial basis function networks for which the absence of local minima is gained under the condition of patterns separable by hyperspheres [10] Roughly speaking, these results suggest to us ....

....of different aspects of Backprop, there is no question that Rumelhart and the PDP group have the credit for the current high diffusion of the algorithm. 2. The weight layer matrices W l ; l = 1; L Gamma 1, are full rank matrices; 3. Ker[X 0 0 ] S Y 1 = f0g. Proof Sketch (see [7] for more details) Because of PR1.3, G 1 = X e 0 ) 0 Y 1 = 0 ) Y 1 = 0. According to Backpropagation step, Y l = Y l 1 W l . Y l = 0 ) Y l = 0 and, consequently, 0 = Y l = Y l 1 W l ) Y l 1 = 0, because of PR1.2. Since Y 1 = 0, YL = 0 follows by induction on l. Finally YL = 0 ) ....

[Article contains additional citation context not shown here]

M. Gori and A. Tesi, "On the problem of local minima in backpropagation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-14, pp. 76--86, January 1992.


Representation of Finite State Automata in Recurrent.. - Frasconi, Gori.. (1996)   (25 citations)  Self-citation (Gori)   (Correct)

....which do not report serious convergence problems. Some attempts to understand the theoretical reasons for the successes and failures of supervised learning schemes have been carried out which explain when such schemes are likely to succeed in discovering optimal solutions (Bianchini et al. 1994; Gori Tesi, 1992; Yu, 1992) and to generalize to new examples (Baum Haussler, 1989) These results give some theoretical foundations to learning from tabula rasa configurations, but unfortunately, the conditions they provide for optimal convergence and for generalization are quite limited in practice. ....

Gori, M. and Tesi, A. (1992). On the problem of local minima in backpropagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(1):76--86.


Terminal attractor algorithms: A critical analysis - Bianchini, Fanelli, Gori.. (1997)   Self-citation (Gori)   (Correct)

No context found.

M. Gori and A. Tesi, "On the problem of local minima in backpropagation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-14, pp. 76--86, January 1992.


Discovering Efficient Learning Rules for Feedforward Neural.. - Radi, Poli (2002)   (Correct)

No context found.

M. Gori and A. Tesi, "On the problem of local minima in backpropagation," IEEE Transactions on PAMI, vol. 14, no. 1, pp. 76--86, 1992.


Financial Time Series Forecasting Using K-Nearest Neighbors - Classification Maggini Giles   (Correct)

No context found.

M. Gori and A. Tesi, "On the problem of local minima in Backpropagation," IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-14, no. 1, pp. 76-86, 1992.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC