| M. Bianchini, P. Frasconi, and M. Gori, "Learning without local minima in radial basis function networks," IEEE Transaction on Neural Networks, vol. 6, no. 3, pp. 749-756, 1995. |
....the necessary derivatives can be easily computed, and the learning rule takes the form: ## k = # #SE emp ## k (4) where # is the learning rate. The main problems with gradient descent are that the convergence is slow and depends on the choice of the initial point. Although Bianchini et al. [9] demonstrated that, in the case of classification, a wide class of RBFNs has a unique minimum, i.e. no local minima exists in the cost function) it is not possible to reach this optimal point, in a short time, from every starting point of the parameter space. Hence the initialization of the ....
....RBFNs can be described as three layer neural networks where hidden units have a radial activation function. Although some of the results of the neural networks can be extended to RBFN, exploiting this interpretation (e.g. approximation capabilities [43] and the existence of a unique minimum [9], substantial di#erences still remain with respect to the other feed forward networks. In fact, RBFNs exhibit properties substantially di#erent with respect to both learning properties and semantic interpretation. In order to understand the di#erent behaviours of the two network types, assume we ....
Monica Bianchini, Paolo Frasconi, and Marco Gori. Learning without local minima in radial basis function networks. IEEE Transactions on Neural Networks, 6(3):749--756, May 1995.
....simpler to implement and suitable to on line learning. Optimized versions of the gradient descent technique such as conjugate gradient, momentum or others are possible. The main problems with gradient descent are that the convergence is slow and depends on the choice of the initial point. Although [Bianchini et al. 1995] demonstrated that, in the case of classification, a wide class of RBFNs has a unique minimum, i.e. no local minima exists in the cost function) it is not possible to reach this optimal point, in a short time, from every starting point of the parameter space. Hence the initialization of the ....
....described as three layers neural networks where hidden units have a radial activation function. Although some of the results of the neural networks can be extended to RBFN, exploiting this interpretation (e.g. approximation capabilities [Hornik et al. 1989] and the existence of a unique minimum [Bianchini et al. 1995], substantial differences still remain with respect to the other feed forward networks. In fact, RBFNs exhibit properties substantially different with respect to both learning properties and semantic interpretation. In order to understand the different behaviours of the two network types, suppose ....
Bianchini, M., Frasconi, P., and Gori, M. (1995). Learning without local minima in radial basis function networks. IEEE Transactions on Neural Networks, 6(3):749--756.
.... As an illustration, consider the sum ofsquares cost function given by E = X n E n (17) 1 In certain special cases, such as classi cation problems in which the patterns are separable by hyperspheres, a cost function that is local minima free with respect to all the weights, can be obtained [BFG95]. 6 such that: E n = 1 2 X k ft n k y k (x n )g 2 ; 18) where t n k is the target value of output unit k when the network is presented with input vector x n . If Gaussian basis functions are used to minimize this cost function, one readily obtains the update equations[GDB92] ....
M. Bianchini, P. Frasconi, and M. Gori. Learning without local minima in radial basis function networks. IEEE Transactions on Neural Networks, 6(3):749-756, 1995.
....of the basis function. RBFs are much used for the mapping of local hard nonlinear features and are many times used in a NN configuration [15] such that . The hard nonlinear properties makes them also useable for classification [25] An advantage of using RBFs is the ability to avoid local minima [3] To fix the ideas, Fig. 3 shows a few typical transfer functions, based on a 2 input, single output plant, such as given in Fig. 2. Fig. 2 2 inputs, single output controller Fig. 3 Typical output planes for different models The above list is not exhaustive. Sometimes very dedicated models are ....
Bianchini M., Frasconi P., Gori M., "Learning without Local Minima in Radial Basis Function Networks", IEEE Transactions on Neural Networks, Vol. 6, No. 3, May 1995, pp. 749 - 756.
....stapes and connecting the prosthesis appropriately. Note that the term stapedotomy may also be used for the actual hole drilled through the stapes. This work was supported in part by Brite Euram Project BE7470. Published in the IEEE Transactions on Information Technology in Biomedicine, vol. 3, no. 4, pp. 268 277, 1999. This paper can be downloaded from ARL s Web page: http: skiron.control.ee.auth.gr 2 Drilling through the stapes could be dangerous because of the potentially excessive tool protrusion, at the end of drilling, due to deflection of the flexible stapes bone under tool ....
....capacity is expected by the authors to be important in skillful tasks including surgical procedures. In this work we have employed the FL framework for learning and decision making. The layout of this article is as follows. Section 2 reviews the state of the art technology in stapedotomy. Section 3 details a feature extraction technique, it outlines aspects of the underlying theory, it reviews the ds FLN scheme for learning, and it exposes the equations needed for the application of the latter scheme. Section 4 details both data acquisition and the experiments carried out, and finally it ....
[Article contains additional citation context not shown here]
M. Bianchini, P. Frasconi, and M. Gori, "Learning Without Local Minima in Radial Basis Function Networks", IEEE Trans. Neural Networks, vol. 6 (3), pp. 749-756, 1995.
.... is termed divide and conquer [38] In supervised learning, an interesting modular proposal that addresses the major problem of providing effective integration of the system modules is presented in [38] Analytically, the importance of developing modular architectures has been stressed in [40] [41], where sufficient (but not necessary) conditions capable of guaranteeing local minima free cost functions are detected, such that a simple gradient descent algorithm can always reach the absolute minimum of the error surface. Contiguous to the problem of fine tuning modular learning systems on ....
M. Bianchini, P. Frasconi, M. Gori, "Learning without local minima in radial basis function networks," IEEE Trans. Neural Networks, vo. 6, no. 3, pp. 749-756, 1995.
....when the number of the training patterns is extremely large and the dimension of the input is big. Furthermore the gradient descent method tends to settle down to a local minimum and sometimes even does not converge if the patterns of the outputs of the middle layer are not linearly separable (Monica Bianchini and Gori, 1995). A recent study (Luo, 1991) presented the conditions under which the training of a network with weights of just one layer always converges by the gradient descent method. This means that we can make the training of an RBFN always converge by the gradient descent method. However, even when it does ....
Monica Bianchini, P. F. and Gori, M. (1995). Learning without local minima in radial basis function networks. IEEE Transactions on Neural Networks, 6(3), 749-756.
....choice for those classification problems which do not have any particular requirements. The architecture and the training methods of an RBF network are well known(Moody and Darken, 1989; Poggio and Girosi, 1990; Musavi et al. 1992; Wettschereck and Dietterich, 1992; Vogt, 1993; Haykin, 1994; Monica Bianchini and Gori, 1995). However most of the training algorithms are for function approximation. Recently we proposed an effective training algorithm of an RBF network when the problem is a pattern classification(Hwang and Bang, 1997) Here we will apply the proposed method to a typical pattern classification problem of ....
Monica Bianchini, P. F. and Gori, M. (1995). Learning without local minima in radial basis function networks. IEEE Transactions on Neural Networks, 6(3):749--756.
....and report some experimental results on the AIDA database. Some constraints on available computational resources suggested the use of the LVQ algorithm and, particularly a variant in which Kohonen s # parameter is chosen on the basis of some intriguing links Backpropagation learning, sketched in [3]. The corresponding heuristics gives rise to neural networks, referred to as competitive radial basis functions (CRBF) where a sort of competition among units, typical of LVQ networks, is introduced in radial basis functions. The learning in these networks resembles LVQ scheme and provides a ....
....as c ### arg max . 5) The learning algorithm is based on the minimization of an error function E(D;#) with respect to the vector # that collects all the parameters of the model. Common choices of E are the relative entropy function and the mean square error. Recently, Bianchini et al. [3] have shown that the training of the locally tuned neuron parameters of radial basis function networks is formally the same as that used by LVQ algorithm for training the codevectors. They proved that the centers # of RBF networks are updated according to (r#1) # (r) # (r) u### # ....
M. Bianchini, P. Frasconi, M. Gori, Learning without local minima in radial basis function networks, IEEE Trans. Neural Networks 6 (1995) 749}756.
.... guaranteed when the patterns are linearly separable [7] or when using networks with as many hidden units as patterns to learn [8, 9] Analogous results hold for radial basis function networks for which the absence of local minima is gained under the condition of patterns separable by hyperspheres [10]. Roughly speaking, these results suggest to us that optimal learning is certainly achieved in the limit cases of many input and many hidden units networks. In the first case, the assumption of using networks with many inputs makes the probability of linearly separable patterns very high 2 . ....
....The assumption made in this paper, however, does not change the essence of the analysis, which can also be carried out under the hypothesis of linear outputs. We can provide a natural extension of Theorems 1 and 2, stated for sigmoidal networks, to this case. We do not restate Theorem 1 (see [10]) as it sounds almost the same as that of MLNs, with the only exception that PR1.3 condition must hold for Ker[ X 0 0;i(1) and S Y i(1) that is independently for each hidden neuron. Like in the case of MLNs, in order to discover more significant conditions, we propose a geometric ....
[Article contains additional citation context not shown here]
M. Bianchini, P. Frasconi, and M. Gori, "Learning without local minima in radial basis function networks," IEEE Transactions on Neural Networks, vol. 6, pp. 749--756, May 1995.
....can be established in very general situations, but unfortunately, the resulting architectures have a poor capability of generalisation. Finally, the results given for feedforward networks can be extended also to other multilayered architectures having different types of neurons. Recently, in [13] was analysed the problem of optimal learning for radial basis functions. Under the assumption that the patterns are separable by hyperspheres, which turns out to be a sort of dual condition of linear separability for inner product based neurons, it can be proved that the cost function is local ....
M. Bianchini, P. Frasconi, and M. Gori, "Learning without local minima in radial basis function networks," IEEE Transactions on Neural Networks, vol. 6, pp. 749--756, May 1995.
....a clear theoretical foundation. Recently, some efforts have been made to understand the behavior of batch mode Backpropation by the analysis of the shape of the error surfaces. In particular, the emphasis has been placed on the problem of local minima and on conditions that guarantee their absence [2, 3, 4, 5, 6, 7]. To the best of our knowledge, however, no attempt has been made to investigate the optimal convergence of on line Backpropagation 1 , that is used successfully in many practical applications. As earlier pointed out in [8] the on line updating departs to some extent from the true gradient ....
M. Bianchini, P. Frasconi, and M. Gori, "Learning without local minima in radial basis function networks." To appear in IEEE Transactions on Neural Networks.
....aimed at guaranteeing local minima free error surfaces. So far, however, only some sufficient conditions have been identified that give rise to unimodal error surfaces. Examples are the the case of pyramidal networks [8] commonly used in pattern recognition, radial basis function networks [2], and non linear autoassociators [3] The identification of similar conditions ensures global optimisation just by using simple gradient descent. Instead of looking for local algorithms like gradient descent, techniques that guarantee global optimisation may be explored. Of course, one of the main ....
M. Bianchini, P. Frasconi, and M. Gori, Learning without local minima in radial basis function networks, IEEE Transactions on Neural Networks, 6, (1995), 749--756.
No context found.
M. Bianchini, P. Frasconi, and M. Gori, "Learning without local minima in radial basis function networks," IEEE Transaction on Neural Networks, vol. 6, no. 3, pp. 749-756, 1995.
No context found.
Bianchini, M., Frasconi, P., and Gori, M. (1995), "Learning Without Local Minima in Radial Basis Function Networks", IEEE Trans. Neural Networks, Vol. 6 (3), pp. 749-756.
No context found.
M. Bianchini, E. Frasconi, and M. Gori, "Learning without local minima in radial basis function networks," IEEE Trans. on Neural Networks, vol. 6, no. 3, 1995, 749-756. 19
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC