233 citations found. Retrieving documents...
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834--846. Baxter, J., & Bartlett, P. L. (2000). Reinforcement learning in POMDPs via direct gradient ascent. In Proceedings of the Seventeenth International Conference on Machine Learning.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Application Of Direct-Vision-Based Reinforcement Learning - Toarealmobilerobot Masaru..   (Correct)

....layer[5] That is somewhat abstract information emerged by necessity of the task accomplishment. Once the hidden representation is obtained, the learning is done on this global space, and that makes the learning drastically faster. 3. ACTOR CRITIC ARCHITECTURE Here, actor critic architecture[9] is employed, and actor (action command generator) and critic (state evaluator) are composed of one layered neural network. This means that the hidden layer is used by both actor and critic. This architecture is the same as the simulations in [5] TD (Temporal Difference) is applied for the ....

Barto, A. G., Sutton, R. S. and Anderson, C. W. (1983) "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems", IEEE Trans. SMC-13, pp.835-846


Hidden Representation after Reinforcement Learning of Hand.. - Shibata, Ito   (Correct)

....critic output generation. This explanation also clarifies that the receptive field expanded around the tool when the monkey used it; also, neurons activate around the hand even though the hand is hidden under the opaque plate. III. REINFORCEMENT LEARNING In this paper, actor critic architecture[14] is employed and implemented in a four layered neural network as shown in Fig. 4. There are three outputs: one for the critic and two for the actor. Temporal smoothing, or TS, learning is employed for learning of the critic [15] 9] It is very similar to temporal difference, or TD, learning; its ....

A. G. Barto, R. S. Sutton & C. W. Anderson (1983) Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems, IEEE Trans. SMC, 13: 835-846


Effect Of Force Load In Hand Reaching Movement Acquired By.. - Shibata, Ito   (Correct)

....It is also known that when the trajectory curves, the stiffness of the hand becomes large by the simultaneous activation of the paired muscles[7] In other words, the trajectory error becomes less by making the feedback gain large. 3. ACTOR CRITIC AND TS LEARNING Here, actor critic architecture[8] is employed, and implemented in one layered neural network. The output neurons are divided into one critic output and the other actor outputs. For the learning of the critic, TS (temporal smoothing) based learning is employed[9] It is very similar to TD (temporal difference) learning[8] and the ....

....is employed, and implemented in one layered neural network. The output neurons are divided into one critic output and the other actor outputs. For the learning of the critic, TS (temporal smoothing) based learning is employed[9] It is very similar to TD (temporal difference) learning[8] and the details can be seen in [9] The training signal for the critic p s and for the actor m s are p s (t 1) p(t) P range Nmax [i] 1) m s (t 1) m(t 1) rnd(t 1) p(t) p(t 1) 2) where p: the critic output, P range : the value range of the ideal critic output, here, 0.4 ( 0.4) ....

[Article contains additional citation context not shown here]

A. G. Barto, R. S. Sutton & C. W. Anderson (1983) Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems, IEEE Trans. SMC, 13: 835-846


Reinforcement Learning & Artificial Neural Networks - Kavehercy (1996)   (Correct)

....delayed RL tasks involving prediction problems. Now we are going to discuss some popular designs of RL Controllers. 3.5 Reinforcement Learning Methods 3.5. 1 Adaptive Critic Method (discrete) Adaptive critic was an early method of reinforcement learning presented by Barto, Sutton and Anderson [4] which they used in their classic task of pole balancing. The adaptive critic learning systems contain two parts. The critic which has the task of predicting future costs. And the actor which produces actions at each state x. It is assumed that the state and action space are both finite, ....

A.G. Barto, R.S. Sutton, and C.J.C.H. Watkins. Neuronlike adaptive elements that can solve difficult learning problems. IEEE Transactions, 13:834 846, 1983.


Learning Implicit Models during Target Pursuit - Gaskett, Brown, Cheng, Zelinsky (2003)   (Correct)

.... movements [34] Shibata and Schaal s [18] controller included velocity components as well as position components, based on a model of the human vestibulo ocular reflex (VOR) and optokinetic reflex (OKR) The VOR model included a learning component, using FEL with an eligibility trace mechanism [35]. Reinforcement learning was applied to active head control by Piater et al. 36] The system controlled one degree of freedom, vergence movements, in which both eyes turn inward or outward to look at an object at a particular distance [37] Five discrete actions were available: changes in ....

A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems, " IEEE Transactions on systems, man and cybernetics, vol. SMC-13:pp. 834--846, 1983.


Reinforcement Learning with Exploration - Reynolds (2002)   (Correct)

....case where n 1. Updates are performed after each and every step using knowledge only about the immediate reward collected and the next state entered. 3.4. 2 TD(0) The temporal difference learning algorithm, TD(0) can be used to evaluate a policy and works through applying the following update [13, 147, 148]: t) t) Ot( t) rt i (3t l) 3t) 3.16) where rt l is the reward following at taken kom st and selected according the policy under evaluation. Note that this has the same form as the RunningAverage update rule where the target is E[rt x ff(st x) Recall kom Equation 2.6 that, ....

....variance) even if the evaluation policy is not followed. It is not clear, however, whether this method can be used effectively for control optimisation. Local Input Features Local (i.e. narrow) input features are a common feature in many practical applications of function approximation in RL [13, 163, 71, 1, 68, 167, 150, 95, 140, 141]. Why might this be so Consider any goal based tasks where bootstrapping estimates are employed. Here the values of states near the goal may completely define the true values of all other states. If broad features are used and the bulk of the updates are made at states away from the goal (as can ....

Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. Neuronlike adaptive elements that can solve difficult learning problems. IEEE Transactions on Systems, Man and Cybernetics, 13(5):834-846, Septemeber 1983.


Improvisation and Learning - Franklin (2001)   (Correct)

....26 human inputs to the hidden layer. Each output unit gets the tOO hidden units outputs as input. The original 50 weights are halved and used as initial values of the two sets of 50 hidden unit weights to the output unit. 3. 2 SSR and Critic Algorithms Using actor critic reinforcement learning ([2, 10, 13]) the actor chooses the next note to play. The critic receives a raw reinforcement signal r from the critique made by the rules of Section 1.2. For output j, the SSR (actor) computes mean tj = io wiJ xi. A Gaussian distribution with mean and standard deviation aj chooses the output yj. r is ....

A.G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive element that can solve difficult learning control problems. 1EEE Transactions on Systems, Man, and Cybernetics, SMC- 13:834 846, 1983.


Robust Reinforcement Learning - Morimoto (2001)   (2 citations)  (Correct)

....rate. Note that we do not assume f(x = 0) 0 because the error output z is generalized as the reward r(x; u) in RRL framework. 3. 1 Actor disturber critic We propose actor disturber critic architecture by which we can implement robust RL in a model free fashion as the actor critic architecture[1]. We define the policies of the actor and the disturber implemented as u(t) A u (x(t) v ) n u (t) and w(t) Aw (x(t) v ) nw (t) respectively, where A u (x(t) v ) and Aw (x(t) v ) are function approximators with parameter vectors, v and v , and n u (t) and nw (t) are ....

A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13:834--846, 1983.


Variance Reduction Techniques for Gradient Estimates.. - Greensmith, Bartlett, .. (2001)   (1 citation)  (Correct)

....between the reward and the expected reward) We give bounds on the estimation variance that show that, perhaps surprisingly, this may not be the best choice. The second approach (Section 3) is the use of an approximate value function. Such actorcritic methods have been investigated extensively [3, 1, 14, 10]. Generally the idea is to minimize some notion of distance between the fixed value function and the true value function. In this paper we show that this may not be the best approach: selecting the fixed value function to be equal to the true value function is not always the best choice. Even more ....

A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834--846, 1983.


Using Free Energies to Represent Q-values in a Multiagent.. - Sallans, Hinton (2000)   (4 citations)  (Correct)

....consuming to extract the policy directly from the Q function because a non linear optimization must be solved for each action choice. One solution, called an actor critic architecture, is to use a separate function approximator to model the policy (i.e. to approximate the non linear optimization) [2, 3]. This has the advantage of being fast, and allows us to explicitly learn a stochastic policy, which can be advantageous if the underlying problem is not strictly Markov [4] However, a specific parameterized family of policies must be chosen a priori. Instead we present a method where the ....

....algorithm is from [7, 8] A delta rule update similar to ours was explored by [9] for POMDPs and Q learning. Factored MDPs and function approximators have a long history in the adaptive control and RL literature (see for example [10] Our method is also closely related to actor critic methods [2, 3]. Normally with an actorcritic method, the actor network can be viewed as a biased scheme for selecting actions according to the value assigned by the critic. The selection is biased by the choice of parameterization. Our method of action selection is unbiased (if the Markov chain is allowed to ....

A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man and Cybernetics, 13:835-- 846, 1983.


The Role of Chemical Mechanisms in Neural Computation and Learning - Hiller   (Correct)

....3.3 Summary Traditional computational models of neurons consider only two time scales: the very fast time scale of the cell s electrical responses, and the permanent changes which result from long term learning. A few exceptions can be found in real time models of reinforcement learning (e.g. [2, 62, 63]) where an exponentially decaying internal memory trace is used to model the effect of the ISI on the learning response. The mechanisms found in Aplysia s synapses, however, demonstrate the presence of an wide range of processes acting at different time scales whichhave a significant impact on ....

Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5), 1983.


Journal of Artificial Intelligence Research 15 (2001).. - Jonathan Baxter Jbaxter   (Correct)

No context found.

Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834--846. Baxter, J., & Bartlett, P. L. (2000). Reinforcement learning in POMDPs via direct gradient ascent. In Proceedings of the Seventeenth International Conference on Machine Learning.


Journal of Artificial Intelligence Research 15 (2001).. - Jonathan Baxter Jbaxter   (Correct)

No context found.

Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834--846.


To be published in: IEEE International Conference on - Systems Man And   (Correct)

No context found.

A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13:834--846, 1983.


Learning to Trade via Direct Reinforcement - Moody, Saffell (2001)   (6 citations)  (Correct)

No context found.

A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst., Man, Cybern., vol. 13, no. 5, pp. 835--846, Sept. 1983.


Reinforcement Learning For Robotic Reaching And Grasping - Fagg (1993)   (4 citations)  (Correct)

No context found.

Barto, A. G., Sutton, R. S., and Anderson, C. W., Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems. IEEE Transactions on Systems, Man, and Cybernetics, (1983) SMC-5:834-46.


Cooperation of Categorical and Behavioral Learning in a.. - Graduate School Of   (Correct)

No context found.

) Barto, A. G., Sutton, R. S. and Anderson, C. W., "Neuronlike adaptiveelements that can solve difficult learning control problems," IEEE Transactions on Systems, Man, and Cybernetics, 13,5, pp. 834--846, 1983.


Unknown -   (Correct)

No context found.

A.G.Barto, R.S.Sutton, and C.W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions onSl Man, and Cybernetics, SMC-13(5):834--846, 1983.


Implemented as Well as the Flexibility to Provide Training for.. - Edwards (1991)   (Correct)

No context found.

Barto, A.G., Sutton, R.S. and Anderson, C.W., "Neuronlike Adaptive Elements That Can Solve Difficult Learning Problems", IEEE Trans. on Systems, Man and Cybernetics, SMC-13:834-846, 1983.


A Neural Network Model with Dopamine-Like Reinforcement Signal .. - Suri, Schultz (1999)   (3 citations)  (Correct)

No context found.

Barto A. G., Sutton R. S. and Anderson C. W. (1983) Neuronlike adaptive elements that can solve difficult learning control ploblems. IEEE Trans on Systems, Man, and Cybernetics SMC-13, pp. 834--846.


Ibots Learn Genuine Team Solutions - Versino, Gambardella (1997)   (2 citations)  (Correct)

No context found.

A.G. Barto, R.S. Sutton, and C.W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13:835--846, 1983.


On Multiagent Q-Learning in a Semi-competitive Domain - Tuomas Sandholm And (1996)   (6 citations)  (Correct)

No context found.

Barto, A. G., Sutton, R., and Anderson, C. W. 1983. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems. IEEE Trans. Systems, Man, and Cybernetics, 13:834--846.


Statistical Factors in Behaviour Learning - Chris Thornton Cognitive   (Correct)

No context found.

Barto, A., Sutton, R. and Anderson, C. (1988). Neuronlike adaptive elements that can solve difficult learning control problems. Neurocomputing. MIT Press.


Continual Learning for Mobile Robots - Großmann (2001)   (Correct)

No context found.

A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuron-like adaptive elements that can solve difcult learning control problems. IEEE Transactions on Systems, Man and Cybernetics, 13:835--846, 1983. 159


Model-Based Reinforcement Learning for Evolving Soccer .. - Wiering, Salustowicz, .. (2001)   (Correct)

No context found.

Barto, A.G., Sutton, R.S., and Anderson, C.W. (1983), "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, pp. 834--846.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC