24 citations found. Retrieving documents...
C. Darken and J. Moody, "Towards faster stochastic gradient search," Neural Information Processing Systems, vol. 4, pp. 1009--1016, 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
On-Line Transform Domain LMS Algorithm Implemented with PCA.. - Wang, Yen, Principe   (Correct)

....(2) a controlling factor is put before the gain ,j=1, M. This makes the adaptation rule slightly different from the original Kalman filter approach proposed by Bannour [1] The step size controlling factor is incorporated so that the search then convergence idea is integrated into the learning [2]; namely, where and are constants. Due to the fact that the eigenvectors of the PCA converge sequentially from the W i n ( D 2e i n ( V i n ( 1) W 2 W N W 1 yn( dn( en( Tapped Delay Line N by N Orthogonal Transform input x(n) Y N n ( Y 1 n ( Power Normalization V N n ( U N n ....

Darken, C., and Moody, J., "towards faster stochastic gradient search," NIPs 4, pp. 1009-1016, San Mateo, CA: Morgan Kaufmann, 1992.


Nonmonotone And Perturbed Optimization - Solodov (1995)   (2 citations)  (Correct)

....j : Similarly, lim i 1 x 2i 1 = 1 2 Gamma j : The two accumulation points are distinct (and different from x ) for any fixed j. Note however, that they both tend to x in the limit as j 0. Remark 1.3.2 We note that the learning rates rules that satisfy (1.3. 4) were used in practice [10], and are known as search then converge strategy. In the first phase of learning, called the search phase , the learning rate is almost constant, or it decreases slowly. In the second phase of learning, called the converge phase , it decreases to zero. In particular, two possible rules for the ....

....strategy. In the first phase of learning, called the search phase , the learning rate is almost constant, or it decreases slowly. In the second phase of learning, called the converge phase , it decreases to zero. In particular, two possible rules for the learning rate have been suggested [10], 9] j i = j 0 1 1 i i 0 28 and j i = j 0 1 ci j 0 i 0 1 ci j 0 i 0 i 0 ( i i 0 ) 2 where j 0 0, c 0, i 0 1 are appropriately chosen parameters. Note that for i i 0 , the learning rate j i = j 0 , and for i i 0 the learning rate decreases proportional to 1=i. ....

C. Darken and J.Moody. Towards faster stochastic gradient search. In G. Tesauro J.D. Cowan and J. Alspector, editors, Advances in Neural Information Processing Systems 4, pages 1009--1016, San Francisco, CA, 1991. Morgan Kaufmann Publishers.


Using Curvature Information for Fast Stochastic Search - Orr, Leen (1997)   (5 citations)  (Correct)

....extending our earlier work [7] to fully non linear problems. We demonstrate the performance on several large back prop networks trained with large datasets. Implementations of stochastic learning typically use a constant learning rate during the early part of training (what Darken and Moody [4] call the search phase) to obtain exponential convergence towards a local optimum, and then switch to annealed learning (called the converge phase) We use Darken and Moody s adaptive search then converge (ASTC) algorithm to determine the point at which to switch to 1=t annealing. ASTC was ....

....adaptive momentum was significantly better than ASTC. In the next section, we examine this problem in more detail. 3.3 Switching on Annealing A complete algorithm must choose an appropriate point to change from constant search to annealed learning. We use Moody and Darken s ASTC algorithm [4, 14] to accomplish this. ASTC measures the roughness of trajectories, switching to 1=t annealing when the trajectories become very rough an indication that the noise in the updates is dominating the algorithm s behavior. In an attempt to satisfy 5 Prediction of a 4 Theta 4 block of image pixels ....

Christian Darken and John Moody. Towards faster stochastic gradient search. In J.E. Moody, S.J. Hanson, and R.P. Lipmann, editors, Advances in Neural Information Processing Systems 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992.


On-Line Stochastic Functional Smoothing Optimization for.. - Wang, Principe (1997)   (1 citation)  (Correct)

....used for neural network training. Hanson added random noise to the weight adaptation and experimentally showed the convergence to a global minimum with constant [Hanson, 1990] Using exactly the idea given by Eq. 2) but with a damped Darken and Moody proposed the search then convergence scheme [Darken and Moody, 1991, 1992] As far as the slow search convergence is concerned, it is well known that the convergence speed can be improved greatly if higher order curvature information is employed in the adaptation [Beck and Le Cun, 1988; Baritompa, 1994; Le Cun et al. 1991, 1993] Theoretically, in a neural ....

Darken, C., and Moody, J., "Towards Faster Stochastic Gradient Search," NIPs 4, 1991.


Adaptive Step Size Techniques For Decorrelation And Blind.. - Douglas, Cichocki (1998)   (Correct)

....of the system. These techniques include asymptotically optimal methods as derived via the theory of stochastic approximation [22] methods based on a statistical analysis of the particular system [23, 24] and heuristic approximations to these methods, commonly known as search and converge [25] or gearshifting. By contrast, adaptive methods are based on on line measurements of the state of the adaptive system, usually as characterized by the outputs or by the parameter updates of the system. Non adaptive step size methods usually require more information about the adaptive system and ....

C. Darken and J. Moody, "Towards faster stochastic gradient search," Advances in Neural Information Processing Systems, vol. 4, (San Mateo, CA: Morgan Kaufman, 1991), pp. 10091016.


Scaling EM (Expectation-Maximization) Clustering to Large.. - Bradley, Fayyad, Reina (1999)   (1 citation)  (Correct)

....density function (e.g. BIRCH, DBSCAN [SEKX98] CLARANS, CURE [GRS98] etc. Three techniques estimating mixture model parameters over large databases are evaluated: scalable EM (SEM) standard or vanilla EM run over random samples of the database (VEM) and an online EM implementation (OEM) [NH99, DM92]. The online EM algorithm is a stochastic gradient descent approach that operates by updating the initial mixture model one record at a time [DM92] A single record is read and its membership probabilities in each of the k clusters is computed. The cluster parameters are then updated and the ....

.... are evaluated: scalable EM (SEM) standard or vanilla EM run over random samples of the database (VEM) and an online EM implementation (OEM) NH99, DM92] The online EM algorithm is a stochastic gradient descent approach that operates by updating the initial mixture model one record at a time [DM92]. A single record is read and its membership probabilities in each of the k clusters is computed. The cluster parameters are then updated and the record is purged from memory. SEM has three major parameters: primary compression factor p (Section 3.3.1) standard tolerance b (Section 3.3.2) and ....

C. Darken and J. Moody. "Towards Faster Stochastic Gradient Search". In Advances in Neural Information Processing Systems 4, Moody, Hanson, and Lippmann, (Eds.), Morgan Kaufmann, Palo Alto, 1992.


Rule Inference for Financial Prediction using Recurrent.. - Giles, Lawrence, Tsoi (1997)   (9 citations)  (Correct)

....All nodes included a bias input which was part of the optimization process. Weights were initialized as shown in Haykin [4] Target outputs were 0.8 and 0.8 using the tanh output activation function and we used the quadratic cost function. A search then converge learning rate schedule was used [1] with an initial learning rate was 0.5. 2.5 Non stationarity The approach we use to handle non stationarity is to build models based on short time periods only. There is a noise vs. non stationarity tradeoff as the size of the training set is varied. If the training set is too small, noise makes ....

C. Darken and J.E. Moody. Towards faster stochastic gradient search. In Neural Information Processing Systems


On the Applicability of Neural Network and Machine Learning.. - Lawrence, al. (1996)   (6 citations)  (Correct)

....typically used in order to avoid slow convergence and local minima. However, a constant learning rate results in significant parameter and performance fluctuation during the entire training cycle. Moody and Darkin have proposed search then converge learning rate schedules (Darken Moody 1991, Darken Moody 1992): j(t) j 0 1 t (5) We have found that the learning rate during the final epoch still results in considerable parameter fluctuation 10 and hence we have added an additional term to further reduce the learning rate over the final epochs. 3. Alphabetical ordering of the training data. ....

Darken, C. & Moody, J. (1992), Towards faster stochastic gradient search, in `Neural Information Processing Systems 4', Morgan Kaufmann, pp. 1009--1016.


A Classical Algorithm For Avoiding Local Minima - Gorse, Shepherd, Taylor (1994)   (4 citations)  (Correct)

....squashing for both computational layers) improve the percentage success, but there is a clear trend toward worsened performance for the more sophisticated algorithms. Simple on line BP (without momentum) performs best; this may be due to the method s stochastic features, as discussed in [3]. Trapping in local minima can also be observed for continuous function learning problems. McInerney et al. [4] have discovered (by exhaustive search of the error weight surface) local minima in a (1 2 1) network (with a linear output node) learning the sine function. This problem was also ....

C Darken and J M Moody, "Towards faster stochastic gradient search", in: Advances in Neural Information Systems 4, Morgan Kaufmann, San Mateo, CA, 1009-1016 (1991).


Exact and Perturbation Solutions for the Ensemble Dynamics - Leen (1998)   (Correct)

....instantaneous cost is the true cost E(w) The latter drives the corresponding deterministic, or batch mode, gradient descent algorithm. The learning rate may be independent of time, or it may follow a specified time dependence, or may change in time in response to the progress of the learning [9, 10, 11]. Constant learning rates are commonly employed during the initial phases (and sometimes through all phases) of stochastic learning algorithms. Constant allows the system to converge on local optima at rates comparable to the equivalent batch algorithm (e.g. exponential convergence in quadratic ....

....(4.23) relates the minimal required 0 to the unknown cost function curvature, one is not guaranteed to achieve the optimal convergence rate for an arbitrarily chosen 0 . Since the actual convergence rate given in (4. 25) can be much slower than optimal, this situation has led many researchers [9, 10, 4, 12, 11] to devise algorithms that attempt to adaptively set 0 . Alternative Annealing Schedules The situation is rather different for p 1. Referring to (4.22) at late times the right hand side is dominated by the terms in 1=s p , and the 1=s and O(1=s p ) 3=2 terms can be neglected. Then ....

Christian Darken and John Moody. Towards faster stochastic gradient search. In J.E. Moody, S.J. Hanson, and R.P. Lipmann, editors, Advances in Neural Information Processing Systems 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992.


Vector Quantization with Complexity Costs - Buhmann, Kühnel (1993)   (27 citations)  (Correct)

.... generalizes conventional topological feature maps which suggest the same learning rate for all clusters [30] The question if there exists any faster learning rate schedule c= N P fl T flff p (N) fl ) with c 1 than the schedule proposed by (47) is still open although numerical simulations [12, 13] suggest that it is advisable to choose c 1 to speed up the convergence. Online optimization of the codebook size K relies on a heuristics for cluster merging and cluster creation. We have explored the following heuristics for cluster creation: A data point x i initializes a new reference ....

C. Darken and J. Moody. Towards faster stochastic gradient search. In Neural Information Processing Systems 4, San Mateo, California, 1992. Morgan Kaufmann.


Natural Language Grammatical Inference with Recurrent.. - Lawrence, Giles, Fong (1998)   (14 citations)  (Correct)

....9 Stochastic update does not generally tolerate as high a learning rate as batch update due to the stochastic nature of the updates. alter significantly from the beginning to the end of the final epoch. Moody and Darken have proposed search then converge learning rate schedules of the form [10, 11]: j(t) j 0 1 t (1) where j(t) is the learning rate at time t, j 0 is the initial learning rate, and is a constant. We have found that the learning rate during the final epoch still results in considerable parameter fluctuation, 10 and hence we have added an additional term to ....

C. Darken and J.E. Moody. Towards faster stochastic gradient search. In Neural Information Processing Systems 4, pages 1009--1016. Morgan Kaufmann, 1992.


On the Applicability of Neural Network and Machine.. - Lawrence, Giles, Fong (1995)   (6 citations)  (Correct)

....results in significant parameter and performance fluctuation during the entire training cycle such that the performance of the network can alter significantly from the beginning to the end of the final epoch. Moody and Darkin have proposed search then converge learning rate schedules of the form [10, 11]: j(t) j 0 1 t (6) where j(t) is the learning rate at time t, j 0 is the initial learning rate, and is a constant. We have found that the learning rate during the final epoch still results in considerable parameter fluctuation 15 and hence we have added an additional term to further ....

C. Darken and J.E. Moody. Towards faster stochastic gradient search. In Neural Information Processing Systems 4, pages 1009--1016. Morgan Kaufmann, 1992.


Natural Gradient Descent for Training Multi-Layer Perceptrons - Hua, Amari (1996)   (Correct)

....by (24) and (25) and i (t) are two learning rate schedules defined by i (t) j i 1 c i j i t 1 c i j i t t 2 ; i = 0; 1: 59) Here t is the iteration index. The learning rate function i (t) is a special form of the following search then converge schedules proposed in [7]: t) j 1 c j t 1 c j t t 2 : 60) t is a search phase and t is a converge phase . The learning rate functions i (t) do not have the search phase but they start learning with a weaker converge phase when j i are small. When t is large, each learning rate ....

C. Darken and J. Moody. Towards faster stochastic gradient search. In Advances in Neural Information Processing Systems, 4, eds. Moody, Hanson, and Lippmann, Morgan Kaufmann, San Mateo, pages 1009--1016, 1992.


On the Applicability of Neural Network and Machine.. - Lawrence, Giles, Fong (1995)   (6 citations)  (Correct)

....in significant parameter and performance fluctuation during the entire training cycle such that the performance of the network can alter significantly from the beginning to the end of the final epoch. Moody and Darkin have proposed search then converge learning rate schedules of the form [9] [10]: j(t) j 0 1 t (4) where j(t) is the learning rate at time t, j 0 is the initial learning rate, and is a constant. We have found that the learning rate during the final epoch still results in considerable parameter fluctuation 15 and hence we have added an additional term to further ....

Christian Darken and John Moody. Towards faster stochastic gradient search. In Neural Information Processing Systems 4, pages 1009--1016. Morgan Kaufmann, 1992.


On-Line Step Size Adaptation - Almeida, Langlois, Amaral (1997)   (1 citation)  (Correct)

....when the training set is very large. In other applications of optimization, one sometimes has access only to a noisy estimate of the gradient, making the use of deterministic gradient impossible. Some proposals of stochastic step size adaptation procedures have appeared in the literature [4][5], 7] However, none of them seems to be simple and general enough for widespread use. In this paper we propose a new, simple step size adaptation technique for stochastic gradient optimization, similar in spirit to the deterministic adaptive step sizes technique of [10] see also [11] This ....

Christian Darken and John Moody. Towards faster stochastic gradient search. In Moody, Hanson, and Lippmann, editors, Advances in Neural Information Processing Systems 4, Palo Alto, 1992. Morgan Kaufmann.


Learning Rate Schedules For Faster Stochastic Gradient Search - Darken, Chang, Moody (1992)   (18 citations)  Self-citation (Darken Moody)   (Correct)

....LMS, on line backpropagation, and adaptive k means clustering as special cases. The standard choices of the learning rate j (both adaptive and fixed functions of time) often perform quite poorly. In contrast, our recently proposed class of search then converge (STC) learning rate schedules (Darken and Moody, 1990b, 1991) display the theoretically optimal asymptotic convergence rate and a superior ability to escape from poor local minima However, the user is responsible for setting a key parameter. We propose here a new methodology for creating the first automatically adapting learning rates that achieve the ....

.... j(t) c=t (the usual choice in the stochastic approximation literature for the last forty years beginning with seminal work Robbins and Monro (1951) typically results in slow convergence to bad solutions (high lying local minima) for small c, and parameter blow up for small t if c is too large (Darken and Moody, 1990b, 1991). The available adaptive schedules (i.e. j s depending on the time and on previous exemplars) have problems as well. A schedule developed by Urasiev is proven to converge in principle, but in practice it may converge slowly if at all (see fig. 2) The delta bar delta learning rule, which was ....

[Article contains additional citation context not shown here]

C. Darken and J. Moody. (1991) Towards faster stochastic gradient search. Advances in Neural Information Processing Systems 4, Morgan Kauffman, San Mateo, California. 1009-1016.


Nonlinear Modelling Of Air Pollution Time Series - Rob Foxall Igor (2001)   (Correct)

No context found.

C. Darken and J. Moody, "Towards faster stochastic gradient search," Neural Information Processing Systems, vol. 4, pp. 1009--1016, 1992.


Optimizing the Structure of Radial Basis Function Networks by.. - Wienholt (1993)   (4 citations)  (Correct)

No context found.

Christian Darken and John Moody. Towards faster stochastic gradient search. In John Moody, Hanson, and Lippmann, editors, Advances in Neural Information Processing Systems, 4, pages -- , San Mateo, 1992. Morgan Kaufmann. To appear.


Scaling EM (Expectation-Maximization) Clustering to Large.. - Bradley, Fayyad, Reina (1999)   (1 citation)  (Correct)

No context found.

C. Darken and J. Moody. "Towards Faster Stochastic Gradient Search". In Advances in Neural Information Processing Systems 4, Moody, Hanson, and Lippmann, (Eds.), Morgan Kaufmann, Palo Alto, 1992.


Clustering in Massive Data Sets - Murtagh (1999)   (Correct)

No context found.

Darken, C. and Moody, J., \Towards faster stochastic gradient search", Advances in Neural Information Processing Systems 4, Moody, Hanson and Lippmann, Eds., Morgan Kaufman, San Mateo, 1992.


Optimal Stochastic Search and Adaptive Momentum - Leen, Orr (1994)   (2 citations)  (Correct)

No context found.

Christian Darken and John Moody. (1992) Towards Faster Stochastic Gradient Search. In J.E. Moody, S.J. Hanson, and R.P. Lipmann (eds.) Advances in Neural Information Processing Systems, vol. 4. Morgan Kaufmann Publishers, San Mateo, CA, 1009-1016.


Momentum and Optimal Stochastic Search - Orr, Leen (1993)   (2 citations)  (Correct)

No context found.

Christian Darken and John Moody. Towards faster stochastic gradient search. In J.E. Moody, S.J. Hanson, and R.P. Lipmann, editors, Advances in Neural Information Processing Systems 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992.


NIPS*97 The Efficiency and The Robustness of Natural Gradient.. - Howard Hua   (Correct)

No context found.

C. Darken and J. Moody. Towards faster stochastic gradient search. In Advances in Neural Information Processing Systems, 4, eds. Moody, Hanson, and Lippmann, Morgan Kaufmann, San Mateo, pages 1009--1016, 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC