| Paul J. Werbos. Approximate dynamic programming for real-time control and neural modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, 1992. |
....constant, T 2.662 pu shaft output ahead of reheater, F 0.322 Ill. DERIVATIVE ADAPTIVE CRITICS BASED NEURO CONTROLLER Adaptive Critic Designs (ACDs) are neural network designs capable of optimization over time under conditions of noise and uncertainty. A family of ACDs was proposed by Werbos [9] as a new optimization technique combining concepts of reinforcement learning and approximate dynamic programming. For a given series of control actions, that must be taken in sequence, and not knowing the quality of these actions until the end of the sequence, it is impossible to design an ....
P. Werbos, "Approximate Dynamic Programming for Real-Time Control and Neural Modeling, in Handbook of Intelligent Control, White and Sofge, Eds., Van N0strand Reinhold, ISBN 0442-308574, pp 493 - 525.
....u(x) W) Oa(x; V) 3.26) 0a(x; v) ov where 5 is the learning rate of the actor. This would result in the following learning rule for updating the parameters of the actor: V : V fiO3(x,u(x) W) OO(x; V) 3. 27) 0a( v) ov A more detailed discussion of gradient based methods is given in Werbos [29]. Sofge White [30] have successfully applied a gradient method to optimizing a manufacturing process. 3.5.3 Q Learning This method was first introduced by Watkins [27] For a more detailed discussion of Q learning see also Watkins Dayan [28] In Q Learning unlike the adaptive critic ....
P. Werbos. Approximate dynamic programming for real-time control and neural modelling. In Handbook of Intelligence control, Neural, Fuzzy and Adaptive approaches. Van Nostrand Reinhold, New York, 1992.
....(DHP) Results are presented, showing that DHP produces the best results. II. ADAPTIVE CRITIC DESIGNS A. Background Adaptive critic designs (ACDs) are neural network designs capable of optimization over time under conditions of noise and uncertainty. A family of ACDs was proposed by Werbos [16] as a new optimization technique combining concepts of reinforcement learning and approximate dynamic programming. For a given series of control actions that must be taken sequentially, and not knowing the effect of these actions until the end of the sequence, it is impossible to design an optimal ....
....as one of its inputs, directly or indirectly. Different types of critics have been proposed. For example, Watkins [17] developed a system known as Q learning, explicitly based on dynamic programming. Werbos, on the other hand, developed a family of systems for approximating dynamic programming [16]; his approach subsumes other designs for continuous domains. For example, Q learning becomes a special case of action dependent heuristic dynamic programming (ADHDP) which is a critic approximating the function (see Section II B below) in Werbos family of adaptive critics. A critic which ....
[Article contains additional citation context not shown here]
P. Werbos, "Approximate dynamic programming for real-time control and neural modeling," in Handbook of Intelligent Control, White and Sofge, Eds. New York: Van Nostrand Reinhold, pp. 493--525.
....as Samuel s famous checkers player of the 1950s [61, 60] which, however, made no reference to the DP literature existing at that time. Other early RL research was explicitly motivated by animal behavior and its neural basis [45, 33, 34, 71] Much of the current interest is attributable to Werbos [85, 86, 87], Watkins [82] and Tesauro s backgammonplaying system TD Gammon [75, 76] Additional information about RL can be found in several references (e.g. 2, 5, 32, 72] Despite the utility of RL methods in many applications, the amount of time they can take to form acceptable approximate solutions ....
P.J. Werbos. Approximate dynamic programming for real-time control and neural modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pages 493--525. Van Nostrand Reinhold, New York, 1992.
....parameter by: xv = O3(x, x) w) Oa(x; v) 3.26) where is the learning rate of the actor. This would result in the following learning rule for updating the parameters of the actor: 3O3(x, x) W) O(x; a. 27) A more detailed discussion of gradient based methods is given in Werbos [29]. Sofge White [30] have successfully applied a gradient method to optimizing a manufacturing process. 3.5.3 Q Learning This method was first introduced by Watkins [27] For a more detailed discussion of Q learning see also Watkins : Dayan [28] In Q Learning unlike the adaptive critic ....
P. Werbos. Approximate dynamic programming for real-time control and neural modelling. In Handbook of Intelligence control, Neural, Fuzzy and Adaptive approaches. Van Nostrand Reinhold, New York, 1992.
.... 1 Introduction General function approximators, such as neural networks, have been proposed for a wide variety of control formalisms [1] 2] 3] For optimal control tasks involving nonlinear systems, reinforcement learning (RL) or adaptive critic approaches have been successfully proposed [4], 5] In RL, a controller is optimized on the basis of a cost function representing the expected future costs (or reinforcements ) Various applications of RL in control have been described. Sofge and White used RL to obtain a controller for a manufacturing process for thermoplastic structures ....
....has a simple quadratic form and the linear con troller can be derived directly from this. For nonlinear continuous systems no standard solution exists. In most cases the Q function is represented as a neural network and using this network a second network is trained which acts as controller [4], 5] The control action is not directly computed from the Q function as in the discrete state action space. In this paper we propose a continuous state and action space approach where only a single function approximator is used to represent the sum of future reinforcement as a function of state ....
[Article contains additional citation context not shown here]
P.J. Werbos, "Approximate dynamic programming for real-time control and neural modeling, " in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Werbos Sofge, Eds. Van Nostrand Reinhold, 1992.
....is a collection of algorithms that can be used to optimize a control task. Initially it was presented as a trial and error method to improve the interaction with a dynamical system [9] Later it has been established that it can also be regarded as a heuristic kind of Dynamic Programming (DP) 80][83][8] The objective is to find a policy, a function that maps the states of the system to control actions, that optimizes a performance criterion. Compared with control, we can say that the policy represents the feedback function. In RL, the feedback function is optimized during the interaction ....
....an error that is minimized by 6 Note that the term temporal di#erence was introduced in [68] together with the TD(#) learning rule for function approximators and not for the discrete case as described in section 2.2.4. 2.3. RL FOR CONTINUOUS STATE SPACES 25 standard steepest decent methods [83]. This means the training of the critic becomes minimizing the quadratic temporal di#erence error. E = 1 2 N 1 X k=0 (r k #V (# k 1 , w) V (# k , w) 2 (2.24) where V represents the critic with weights w and # is the discount factor. We see here that this error does not completely ....
[Article contains additional citation context not shown here]
P.J. Werbos. Approximate dynamic programming for real-time control and neural modeling. In D. A. White and D. A. Werbos Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, 1992.
....1989] described an experiment in which he used expert preferences to train a neural network to choose between backgammon positions. The approximation of cost to go differences rather than just the cost to go has been investigated by a number of authors in the context of Q learning. Werbos [Werbos, 1990; 1992] discusses the merits of approximating the gradient of the Q function, whilst Baird [Baird, 1993] Harmon, and Current Destination State Probabilities State Action A B A or B a 1 1 3 2 3 A or B a 2 2 3 1 3 Table 1: Transition Matrix of the two state System r(A) 0 r(B) 1 A B ....
Paul Werbos. Approximate Dynamic Programming for Real-Time Control and Neural Modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control. 1992.
....to make use of a probability calculus for assessing and comparing actions. There has been increasing interest in the temporal credit assignment problem, due principally to the development of learning algorithms based on the theory of dynamic programming (DP) Barto, Sutton, Watkins, 1990; Werbos, 1992). Sutton s (1988) TD( algorithm addressed the problem of learning to predict in a Markov environment, utilizing a temporal di erence operator to update the predictions. Watkins (1989) Q learning algorithm extended Sutton s work to control problems, and also clari ed the ties to dynamic ....
Werbos, P. (1992). Approximate dynamic programming for real-time control and neural modeling. In D. A. White and D. A. Sofge, (Eds.), Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pp. 493-525. New York: Van Nostrand Reinhold.
....the results for qualitative models suggest an avenue for simplifying ongoing system identification in adaptive control applications. Introduction A variety of Adaptive Critic Design techniques for training neuro controllers have appeared in the literature in recent years [7] 8] 9] 12] and [13]. These techniques can be divided into model based methods such as Dual Heuristic Programming (DHP) and non model based methods such as Action Dependent Heuristic Dynamic Programming (ADHDP) or Q learning. While the DHP method has been shown to be much more efficient for training neurocontrollers ....
Werbos, P.J., "Approximate Dynamic Programming for Real-Time Control and Neural Modeling", in D.A. White & D.A. Sofge eds., Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, New York, 1992, pp. 493 - 525.
....the actual ball position B(t) and the corresponding desired ball position Bdesired(t) Two ACDs, Heuristic Dynamic Programming (HDP) and Dual Heuristic Programming (DHP) were implemented. HDP outputs the cost to go function J(t) while DHP outputs the derivative of J(t) The reader is referred to [11, 12, 19] for the detailed descriptions of these ACDs. As we shall see, DHP works well for this problem, whereas HDP does not. We nevertheless begin by introducing the architecture of HDP, because DHP is best explained by its contrast with HDP, and the shortcomings of HDP for this problem serve to ....
Werbos, P. J. (1992). Approximate Dynamic Programming for Real-Time Control and Neural Modeling. In D. A. White &D. A. Sofge (Eds.), Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches (pp. 493-525). New York, NY: Van Nostrand Reinhold.
.... Critic Design techniques for training neuro controllers have appeared in the literature recently, falling into model based methods such as Dual Heuristic Programming (DHP) and non model based methods such as Action Dependent Heuristic Dynamic Programming (ADHDP) or Q learning (Barto, et al. 1983, Werbos 1990, 1992, Santiago Werbos, 1994, Prokhorov, Santiago Wunsch 1995, Prokhorov Wunsch 1997) The DHP method has been shown to be much more efficient for neurocontroller training and to produce superior designs to the non model based methods. However its implementation relies on having an explicit ....
.... A useful identity based on the above equation is the Bellman Recursion ) 1 ( t J t U t J A promising collection of such approximation techniques based on estimating the function J(t) using this identity with neural networks as function approximators was proposed by Werbos (Werbos, 1990, 1992). These networks are often called Adaptive Critics, though this term can be applied more generally to any network that provides learning reinforcement to another entity (Widrow et al. 1973) As a practical matter, any computational structure capable of acting as a universal function approximator ....
Werbos, P.J., (1992), "Approximate Dynamic Programming for Real-Time Control and Neural Modeling", in D.A. White & D.A. Sofge eds., Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, New York, pp. 493 - 525.
.... techniques for training neuro controllers have appeared in the literature recently, falling into model based methods such as Dual Heuristic Programming (DHP) and non model based methods such as Action Dependent Heuristic Dynamic Programming (ADHDP) or Q learning [1] 5] 6] 10] 11] 12] 15] 16] 19][20]. Previous applications of Adaptive Critic based reinforcement learning to the tuning of fuzzy controllers have relied on non model based temporal differencing schemes [3] 4] 7] 8] Equivalent neural network based techniques have been shown to be generally less effective than model based ....
.... equation is the Bellman Recursion ) 1 ( t J t U t J g 0 7803 5877 5 00 10.00 (c) 2000 IEEE A promising collection of such approximation techniques based on estimating the function J(t) using this identity with neural networks as function approximators was proposed by Werbos [19][20]. These networks are often called Adaptive Critics, though this term can be applied more generally to any network that provides learning reinforcement to another entity [21] As a practical matter, any computational structure capable of acting as a universal function approximator can be used in ....
Werbos, P.J., "Approximate Dynamic Programming for Real-Time Control and Neural Modeling", in D.A. White & D.A. Sofge eds., Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, New York, 1992, pp. 493 - 525.
....[SuB98] A more limited textbook discussion is given in the DP textbook by Bertsekas [Ber95] The 2nd edition of the first volume of this DP text [Ber00] contains a detailed discussion of rollout algorithms. The extensive survey by Barto, Bradtke, and Singh [BBS95] and the overviews by Werbos [Wer92a], Wer92b] and other papers in the edited volume by White and Sofge [WhS92] point out the connections between the artificial intelligence reinforcement learning viewpoint and the control theory DP viewpoint, and give many references. ....
Werbos, P. J, 1992. "Approximate Dynamic Programming for RealTime Control and Neural Modeling," in D. A. White and D. A. Sofge, (eds.), Handbook of Intelligent Control, Van Nostrand, N. Y.
....[15] described an experiment in which he used expert preferences to train a neural network to choose between backgammon positions. The approximation of cost to go differences rather than just the cost to go has been investigated by a number of authors in the context of Q learning. Werbos [16, 17] discusses the merits of approximating the gradient of the Q function, whilst Baird [18] Harmon, and Klopf [19] introduced advantaging updating which estimates the value of each state and the relative advantage of each action using separate approximation architectures. More recently McGovern and ....
Paul Werbos. Approximate Dynamic Programming for Real-Time Control and Neural Modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control. 1992.
....to perform learning while controlling and controlling while learning. This entails the need, at each time step, of a prediction of the time integral cost, based, as for DP, on information taken from the next step prediction. Barto et al. s adaptive critic [1] 2] Werbos family of Heuristic DP [21][22][23] 24] and Sutton s TD [18] are all methods trying on line to predict the cost, just looking one step ahead, with the hope to first decrease the computational cost by a spread over the entire run but also to speed up the discovery of a satisfactory controller. Since the cost has to be predicted ....
....the cost (but this time the cost must be function of the action and in a way or another implicitly captures some knowledge of the process) Backpropagation is performed through the neural predictor in order to discover a state feedback control policy which optimizes the cost. Werbos [15] 21][22][23] 24] Barto [1] 2] and Miller and Williams [10] have all developed optimal continuous controller without the need for an explicit prior process model. This second connectionnist implementation of DP aiming at optimizing a continuous neural controller in the absence of the process model will be ....
[Article contains additional citation context not shown here]
Werbos, P. 1992 Approximate Dynamic Programming for real-time control and neural modelling. In D. White & D. Sofge (Eds.) - Handbook of intelligent control. New York: Van Nostrand.
....1.Professor, Portland State University, lendaris sysc.pdx.edu 2. Graduate Student, Electrical Engineering, Portland State University PO Box 751, Portland, OR 97207 This paper discusses strategies for and details of training procedures for the Dual Heuristic Programming (DHP) methodology, defined in [6]. This and other approximate dynamic programming approaches (HDP, DHP, GDHP) have been discussed in some detail in [2] 4] 5] all being members of the Adaptive Critic Design (ACD) family. The example application used is the inverted pendulum problem, as defined in [1] This plant has been ....
....trained on line during each epoch, with a faster overall convergence than the older approach. Further, the measures used herein suggest that a better controller design (the actionNN) results. 1. DUAL HEURISTIC PROGRAMMING DHP DHP is a neural network approach to solving the Bellman equation [6], 3] The idea is to maximize a specified (secondary) utility function , where is defined as: 1) The term is a discount factor ( and is the primary utility function, which must be defined by the user for the specific application context. In this paper, is assumed to be 1. In this case, ....
[Article contains additional citation context not shown here]
Werbos, P. "Approximate Dynamic Programming for Real-Time Control and Neural Modeling", Ch. 13 in Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, (White, D.A. and Sofge, D.A., eds.), Van Nostrand Reinhold, New York, NY, 1992 Controller via Strategy 4b--Disturbances: 23
No context found.
Paul J. Werbos. Approximate dynamic programming for real-time control and neural modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, 1992.
No context found.
Werbos, P. (1992). Approximate dynamic programming for real-time control and neural modeling. In D. A. White and D. A. Sofge, (Eds.), Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pp. 493525. New York: Van Nostrand Reinhold.
No context found.
P. Werbos, "Approximate dynamic programming for real-time control and neural modeling", in Handbook of Intelligent Control, D. White and D. Sofge, Eds., 1992.
No context found.
P.J. Werbos. Approximate dynamic programming for real-time control and neural modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pages 493--525. Van Nostrand Reinhold, New York, 1992.
No context found.
P. J. Werbos, Approximate Dynamic Programming for Real--Time Control and Neural Modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control, 1992. 35
No context found.
P.Werbos, Approximate dynamic programming for realtime control and neural modeling , in Handbook of Intelligent Control, White and Sofge, Eds., Van Nestrand Reinhold, ISBN 0-442-30857-4, pp 493 - 525.
No context found.
P. J. Werbos, 1992, Approximate dynamic programming for real{time control and neural modeling. In Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Sofge, editors, pages 493-525, Van Nostrand Reinhold, NY, USA.
No context found.
P.J. Werbos, "Approximate dynamic programming for real-time control and neural modeling," Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, (D.A. White and D.A. Sofge editors), Van Nostrand Reinhold, New York, 1992.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC