19 citations found. Retrieving documents...
Luo, Z. Q., "On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks," Neural Computation, Vol. 3, 1991, pp. 226-245.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A New Class Of Incremental Gradient Methods For Least Squares.. - Bertsekas (1996)   (4 citations)  (Correct)

.... a cycle over the entire data set starting from x has the form ) x #f(x ) 5) Incremental methods are supported by stochastic convergence analyses [PoT73] Lju77] KuC78] TBA86] Pol87] BeT89] Whi89] Gai94] BeT96] as well as deterministic convergence analyses [Luo91], Gri94] LuT94] MaS94] Man93] Ber95a] BeT96] It has been experimentally observed that the incremental gradient method (2) 3) often converges much faster than the steepest descent method (5) when far from the eventual limit. However, near convergence, the incremental gradient method ....

....far from the eventual limit. However, near convergence, the incremental gradient method typically converges slowly because it requires a diminishing stepsize # = O(1 k ) for convergence. If # is instead taken to be a small constant, an oscillation within each data cycle arises, as shown by [Luo91]. By contrast, for convergence of the steepest descent method, it is su#cient that the stepsize # is a small constant (this requires that be Lipschitz continuous, see e.g. Pol87] The asymptotic convergence rate of steepest descent with a constant stepsize is typically linear and much ....

[Article contains additional citation context not shown here]

Luo, Z. Q., "On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks," Neural Computation, Vol. 3, 1991, pp. 226-245.


Training Neural Nets with the Reactive Tabu Search - Battiff, Tecchiolli   (Correct)

....67 484259.2 (215590.0) 0.5 98 44856.1 (49929.2) 1. 0 96 16239.5 (12842.3) 10.0 64 23370.0 (77689.8) 20.0 32 89302.7 (158133.0) Available techniques to increase the afety requirements and the speed of convergence of BP, are, for exam ple, the use of adaptive learning rates for on line BP [30], the use of line searches and second order information for batch BP, see [7] 3] and the references contained. B, Event discrimination in High Encr9y Physics Experimental HEP facilities need state of the art discrimination systems for selecting and classifying the relevant events. In a ....

Z. Luo, "On the convergence of the LMS algorithm with adaptive learning rate," Neural Computation, vol. 3, pp. 227 245, 1991.


First and Second-Order Methods for Learning: between Steepest.. - Battiti (1992)   (88 citations)  (Correct)

....descent is 1 mao. These results cannot be extended to multilayer networks (with non linear transfer functions) in a straightforward manner, but they can be used as starting point for useful heuristics. The convergence properties of the LMS algorithm with adaptive learning rate are presented in [Luo 1991], together with a clear comparison of the LMS algorithm with stochastic gradient descent and adaptive filtering algorithms. The main result is that if the learning rate e, for the nth training cycle satisfies the two conditions: n OO n 4 n 1 then the sequence of weight matrices ....

Luo, Zhi-Quan. On the Convergence of the LMS algorithm with Adaptive Learning Rate for Linear Feedforward Networks. Neural Computation 3,227-245. 20


Distributed Asynchronous Incremental Subgradient Methods - Nedic, Bertsekas, Borkar (2000)   (1 citation)  (Correct)

....are known as backpropagation methods. They are related to the Widrow Ho algorithm [31] and to stochastic gradient stochastic approximation methods, and they are supported by several recent convergence analyses Bertsekas [3] Bertsekas and Tsitsiklis [6] 7] Gaivoronski [10] Grippo [12] Luo [19], Luo and Tseng [20] Mangasarian and Solodov [21] Tseng [29] It has been experimentally observed that incremental gradient methods often converge much faster than the steepest descent method when far from the eventual limit. However, near convergence, they typically converge slowly because they ....

....method when far from the eventual limit. However, near convergence, they typically converge slowly because they require a diminishing stepsize [e.g. k = O(1=k) for convergence. If k is instead taken to be a small enough constant, convergence to a limit cycle occurs, as rst shown by Luo [19]. The incremental subgradient method was rst studied by Kibardin [15] and more recently by Solodov and Zavriev [28] Nedi c and Bertsekas [23] 25] Ben Tal, Margalit and Nemirovski [1] the incremental subgradient method of the form (3) is considered by Zhao, Luh, and Wang [32] Kiwiel and ....

Z. Q. Luo, On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks, Neural Computation 3 (1991) 226-245.


Distributed Asynchronous Incremental Subgradient Methods - Nedic, Bertsekas, Borkar (2000)   (1 citation)  (Correct)

....are known as backpropagation methods. They are related to the Widrow Ho algorithm [25] and to 4 stochastic gradient stochastic approximation methods, and they are supported by several recent convergence analyses Bertsekas [3] Bertsekas and Tsitsiklis [5] 6] Gaivoronski [8] Grippo [10] Luo [16], Luo and Tseng [17] Mangasarian and Solodov [18] Tseng [23] It has been experimentally observed that incremental gradient methods often converge much faster than the steepest descent method when far from the eventual limit. However, near convergence, they typically converge slowly because they ....

....method when far from the eventual limit. However, near convergence, they typically converge slowly because they require a diminishing stepsize [e.g. k = O(1=k) for convergence. If k is instead taken to be a small enough constant, convergence to a limit cycle occurs, as rst shown by Luo [16]. The incremental subgradient method was studied rst by Kibardin [12] and more recently by Solodov and Zavriev [22] Nedi c and Bertsekas [19] 21] and Ben Tal, Margalit and Nemirovski [1] the incremental subgradient method of the form (3) is considered by Zhao, Luh, and Wang [26] Kiwiel and ....

Z. Q. Luo, On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks, Neural Computation 3 (1991) 226-245.


Analysis Of An Approximate Gradient Projection Method With.. - Luo, Tseng (1994)   (14 citations)  (Correct)

....p (so that (1.3) 1.4) reduces to the Least Mean Square (LMS) algorithm of Widrow and Hoff [WiH60] for training a linear feedforward neural network) has it been shown that, under the stepsize rule (1.8) 1. 9) with j = 1, the generated sequence fx t 1 g converges to a stationary 4 point of f [Luo91]. Finally, we note that, in the case where X = n , other gradient schemes such as quasi Newton and conjugate gradient have also been applied with success (see [HKP91, pp. 124 127] KoA89] KDMT91] Ml93] Wer90] Nonetheless, backpropagation remains the most popular approach to training ....

Z.-Q. Luo, On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks, Neural Computation, 3 (1991), pp. 226--245,.


On-Line Learning Processes in Artificial Neural Networks - Heskes, Kappen (1993)   (10 citations)  (Correct)

....[55] for a review see [33, 65] and unsupervised learning (learning without teacher , e.g. Kohonen learning [37] for a review see [6] The approach taken in this chapter is therefore quite general. It includes and extends results from studies on specific learning rules (see e.g. [3, 9, 48, 53]) 1.2 Outline of this chapter In artificial neural networks, on line learning is modeled by randomly drawing examples from the environment. This introduces stochasticity in the learning process. The learning process becomes a discrete time Markov process 1 , which can be transformed into a ....

Z. Luo. On the convergence of LMS algorithm with adaptive learning rate for linear feedforward netyworks. Neural Computation, 3:226--245, 1991.


Gradient Convergence In Gradient Methods With Errors - Bertsekas, Tsitsiklis (1999)   (1 citation)  (Correct)

.... t## #f(x t ) 0 (with probability 1 on the stochastic case) The method where the errors w t are deterministic includes as a special case the standard incremental gradient backpropagation method for neural network training, the convergence of which has been the object of much recent analysis [Luo91], Gai94] Gri94] LuT94] MaS94] Man93] Ber95a] see the authors [BeT96] for a discussion of incremental gradient methods and their application to neural network training) The method where the errors w t are stochastic 2 1. Introduction includes as special cases the the classical ....

Luo, Z. Q., 1991. "On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks," Neural Computation, Vol. 3, pp. 226-245.


Incremental Gradient Algorithms With Stepsizes Bounded Away From.. - Solodov (1998)   (3 citations)  (Correct)

....and computational practice. In particular, 1.2) implies that the stepsizes tend to zero in the limit, while many heuristic rules used by practitioners keep them bounded away from zero. It is therefore of importance to study the behaviour of IGA when lim i 1 j i = j 0: 1. 3) An example in [9] shows that in general, under the condition of (1.3) one cannot expect convergence to an exact solution even in the simple case when f( Delta) is given by a sum of two strongly convex quadratic (not identical) functions. Fortunately, in neural network applications one is typically not interested ....

Z.-Q. Luo. On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Computation, 3:226--245, 1991.


A Theoretical Comparison of Batch-Mode, on-Line, Cyclic.. - Heskes, Wiegerinck   (Correct)

....[7, 8] learning with time correlated patterns [9] and learning with momentum term [10, 11] The method we use in this paper to derive the (wellknown) results on on line learning can also be applied to learning with cycles of training patterns. Learning with cycles has been studied in [12] for the linear LMS learning rule. Our results are valid for any (nonlinear) learning rule that can be written in the form (1) Furthermore, we will point out and quantify the important difference between cyclic and almost cyclic learning. 2 Tom Heskes and Wim Wiegerinck In Section 2, we will ....

.... Gamma M n ] 26) with j n the learning parameter at learning step n and A(j) the asymptotic misadjustment for the various learning strategies. We recall from Section 3 that A(j) j for on line learning, A(j) j 2 for almost cyclic learning, and A(j) Pj 2 for cyclic learning. Following [12], we consider the concept of an ffl optimal weight, i.e. a weight w within order ffl of the local minimum w . We will calculate the typical number of learning steps n ffl necessary to reach such an ffl optimal weight, i.e. the typical number of learning steps until the misadjustment is of ....

[Article contains additional citation context not shown here]

Z. Luo. On the convergence of LMS algorithm with adaptive learning rate for linear feedforward netyworks. Neural Computation, 3:226--245, 1991.


Classification of Rotating Machine Condition using Artificial .. - Mccormick Beng (1997)   (Correct)

....until the error is acceptable. Unfortunately the training data may not be truly representative of all possible input data and other data need to be used to test the performance of the trained network for unseen data. The backpropagation algorithm can be enhanced by the use of adaptive learning [12, 13] which adjusts the rate at which the weights are adjusted while the training is going on and momentum which helps prevent the algorithm getting stuck in local minima. 3 Experimental Set Up The data used for this work was obtained from the following experimental set up [14] The test system can be ....

Z. Luo. On the convergence of the lms algorithm with adaptive learning rate for feedforward networks. Neural Computing, 3:226--245, 1991.


Convergence Analysis Of Perturbed Feasible Descent Methods - Solodov (1997)   (2 citations)  (Correct)

....which process all the partial objective functions before adjusting the variables. In particular, it is typically more effective than the standard gradient descent method given by T (x; j) x Gamma j K X j=1 rf j (x) x Gamma jrf(x) Incremental methods is an area of active research (see [2, 1, 11, 15, 20, 19, 30, 33]) We note that in the above mentioned papers, the stepsizes are chosen to satisfy the following condition 1 X i=0 j i = 1; 1 X i=0 j 2 i 1: This condition implies that the stepsizes tend to zero in the limit, while many heuristic rules used by practitioners keep them bounded away ....

Z.-Q. Luo. On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Computation, 3:226--245, 1991.


A New Class Of Incremental Gradient Methods For Least Squares.. - Bertsekas (1997)   (4 citations)  (Correct)

.... x k has the form (5) x k 1 = x k # k m # i=1 #f i (x k ) x k # k #f(x k ) Incremental methods are supported by stochastic convergence analyses [PoT73] Lju77] KuC78] TBA86] Pol87] BeT89] Whi89] Gai94] BeT96] as well as deterministic convergence analyses [Luo91], Gri94] LuT94] MaS94] Man93] Ber95a] BeT96] It has been experimentally observed that the incremental gradient method (2) 3) often converges much faster than the steepest descent method (5) when far from the eventual limit. However, near convergence, the incremental gradient method ....

....far from the eventual limit. However, near convergence, the incremental gradient method typically converges slowly because it requires a diminishing stepsize # k = O(1 k) for convergence. If # k is instead taken to be a small constant, an oscillation within each data cycle arises, as shown by [Luo91]. By contrast, for convergence of the steepest descent method, it is su#cient that the stepsize # k is a small constant (this requires that #f be Lipschitz continuous; see, e.g. Pol87] The asymptotic convergence rate of steepest descent with a constant stepsize is typically linear and much ....

[Article contains additional citation context not shown here]

Z. Q.<F3.024e+05><F3.768e+05> LUO,<F3.386e+05> On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward<F3.768e+05> networks, Neural Computation, 3 (1991), pp. 226--245.


Serial And Parallel Backpropagation Convergence Via.. - Mangasarian, Solodov (1993)   (11 citations)  (Correct)

....yields superior computational results than pure BP ( 16] 6] We note that under the same assumptions on the learning rate that we use here, the first two conditions of (3. 4) below, convergence of the sequence of weights for linear feedforward networks without a hidden layer is established in [9]. The paper is organized as follows. In Section 2 we establish the serial and parallel versions of our nonmonotone convergence theorem for unconstrained optimization. We also show that this theorem can be applied to the analysis of a family of optimization methods with perturbed gradients. In ....

Z.-Q. Luo: "On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks", Neural Computation 3, 1991, 226-245.


Real Time Classification of Rotating Shaft Loading.. - McCormick, Nandi (1996)   (Correct)

....This produced a training set with 96 training patterns for each condition and 168 test patterns per condition. A. Classification Using ANNs The feed forward neural networks were trained and implemented using Matlab Neural Network Toolbox [27] using the backpropagation with adaptive learning [28] [29] and momentum [30] program. Each network had a number of inputs determined by the number of features being used for classification. Each input was scaled by a constant factor which was chosen to be the largest estimate of that feature in the training data set. This effectively restricted the ....

Z. Luo, "On the convergence of the lms algorithm with adaptive learning rate for feedforward networks", Neural Computing, vol. 3, pp. 226--245, 1991.


Classification Experiments Involving Backpropagation And Radial.. - Yee   (Correct)

....the m.s.e. is usually the mathematical criterion for optimality during network training, and, as stated earlier, lower m.s.e. over a training set does not automatically imply good generalization. Not surprisingly, definition 2 has generated greater research interest than definition 1. For example, [6] presents rigourous results for the optimal adaptation of j such that the fewest number of epochs is needed to approximate the globally optimal weight matrix to a desired accuracy, albeit only for the case of linear units. In general, however, heuristic and 0.2 0.25 0.3 0.35 0.4 0.45 0 100 200 ....

Z. Q. Luo, "On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks," Neural Computation, vol. 3, pp.226-245, 1991.


Online Learning From Finite Training Sets and Robustness to.. - Sollich, Barber (1998)   (2 citations)  (Correct)

....is any kind of input bias. This suggests strongly that online learning should generally be preferred over offline learning in problems where biased inputs cannot be a priori excluded. In the future, we hope to extend our analysis to dynamic (t dependent) optimization of j; based on the results of (Luo, 1991; Heskes and Wiegerinck, 1996) however, one may suspect that performance improvements over optimal fixed j may be small. More importantly, more complicated network architectures need to be studied in which the crucial question of local minima can be addressed. We speculate that the superiority of ....

Luo, Z. (1991). On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks. Neural Computation, 3(2):226--245.


Serial And Parallel Backpropagation Convergence Via.. - Mangasarian And (1994)   (11 citations)  (Correct)

.... yields superior computational results than pure BP [16] 6] We note that under the same assumptions on the learning rate that we use here, the first two conditions of (27) below, convergence of the sequence of weights for linear feedforward networks without a hidden layer is established in [9]. The paper is organized as follows. In Section 2 we establish the serial and parallel versions of our nonmonotone convergence theorem for unconstrained optimization. We also show that this theorem can be applied to the analysis of a family of optimization methods with perturbed gradients. In ....

Z.-Q. Luo: "On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks", Neural Computation 3, 1991, 226-245.


An Efficient Method to Construct a Radial Basis Function Neural .. - Hwang, Bang (1997)   (3 citations)  (Correct)

....large and the dimension of the input is big. Furthermore the gradient descent method tends to settle down to a local minimum and sometimes even does not converge if the patterns of the outputs of the middle layer are not linearly separable (Monica Bianchini and Gori, 1995) A recent study (Luo, 1991) presented the conditions under which the training of a network with weights of just one layer always converges by the gradient descent method. This means that we can make the training of an RBFN always converge by the gradient descent method. However, even when it does converge, it usually takes ....

.... rate (h = 0:02) D) Gradient descent method that uses a varying learning rate (h = 0:05= p n where n is the iteration number) In this comparison we used the same middle layer that was obtained by using APC III(a = 1:5) The methods C) and D) were implemented based on the algorithms described in (Luo, 1991). As seen in Table 4, the recognition rate by the method B) was the highest in spite of the fact that it requires the normal distribution of the outputs of the middle layer and the same covariance. The result by the method A) was not as good as that by the method B) even though it does not need ....

Luo, Z.-Q. (1991). On the convergence of the lms algorithm with adaptive learning rate for linear feedfoward networks. Neural Computation, 2(3), 226-245.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC