67 citations found. Retrieving documents...
Sutton, R. (1991). Reinforcement learning architectures for animats. In J. Meyer and S. Wilson (Eds.), From Animals to Animats, pp. 288--296.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Designing Efficient Exploration with MACS: Modules and.. - Gérard, Sigaud   (Correct)

....reward is the usual source of reward found in any reinforcement learning framework. We de ne it as Re(s 0 ) corresponding to the immediate reward obtained for visiting the situation s 0 . The immediate rehearsal reward leads to a policy which is similar to the exploration policy described in [Sut91] or [But02b] Then it grows until the situation is visited again and then drops to 0. In order to update R r (s 0 ) we use the classical Widrow Ho equation: R r (s 0 ) 1 # r )R r (s 0 ) # r Then, from each immediate reward, whether internal, rehearsal or external, a long term ....

R. S. Sutton. Reinforcement learning architectures for animats. In J.-A. Meyer and S. W. Wilson, editors, From animals to animats: Proceedings of the First International Conference on Simulation of Adaptative Behavior, pages 288296, Cambridge, MA, 1991. MIT Press.


L'apprentissage Par Renforcement Indirect Dans Les.. - Sigaud, Gérard   (Correct)

....s avre en outre sensiblement plus compact. 2. 3 Les architectures Dyna L ide d une sparation claire entre l apprentissage d un modle des transitions et l apprentissage d un modle des rcompenses tait dj connue dans le cadre de l apprentissage par renforcement tabulaire, avec les architectures Dyna [Sutton, 1991]. En e et, Sutton a propos cette famille d architectures pour doter un agent de la capacit mettre jour plusieurs qualits en mme temps, de faon acclrer sensiblement l apprentissage de la fonction qualit. rcompenses rcompense situations prcdente et courante ENVIRONNEMENT modle des Fig. 3. ....

Sutton, R. S. (1991). Reinforcement learning architectures for animats. Edit dans Meyer, J.-A. & Wilson, S. W., editeurs, From animals to animats: Proceedings of the First International Conference on Simulation of Adaptative Behavior, pages 288296, Cambridge, MA. MIT Press.


Combining Latent Learning with Dynamic Programming in the.. - Gérard, Meyer, Sigaud   (Correct)

....BGS00] is a development of the Anticipatory Behavioral Control theory introduced in psychology by Ho mann [Hof93] YACS [GS01b, GSS02, GS01a] is a similar approach and both systems call upon explicit condition action effect classi ers, noted C A E. This formalism is similar to Sutton s DynaQ [Sut91] approach or to Drescher s context action result rules [Dre91] but it a ords generalization capabilities. In such classi ers, the E part represents the e ects of action A in situations matched by condition C. It records the perceived changes in the environment. In both ACS and YACS, a C part is ....

....the same E and A parts, but its C part matches s t 1 . 5 Combining Latent Learning and Dynamic Programming In section 4, we described how MACS learns a model of the dynamics of the environment with anticipating classi ers. In this section, we show how this model is used in a Dyna architecture [Sut91] to de ne a policy. In such architectures, as illustrated in gure 5, the latent learning process takes place independently from the reward, and permits the construction of a model of S1 Sn . Situation Action DECISION P List of perceptions R(S1) R(Sn) List of classifiers (model of ....

[Article contains additional citation context not shown here]

R.S. Sutton. Reinforcement learning architectures for animats. In J. A. Meyer and S. W. Wilson, editors, From animals to animats: Proceedings of the First International Conference on Simulation of Adaptative Behavior, Cambridge, MA, 1991. MIT Press.


Generalization and Latent Learning in Learning Classifier Systems - Gérard   (Correct)

....used in ACS [Sto98] is a further development of the anticipatory behavioral control theory introduced in psychology by Ho mann [Hof93] YACS [GSS01] uses the same idea and both systems form explicit condition action effect classi ers, noted C A E. This formalism is similar to Sutton s DynaQ [Sut91] approach or Drescher s context action result rules [Dre91] but with a generalization capability. The E part represents the e ects of the considered action in the situations matched by the condition. It records the perceived changes in the environment. In both ACS and YACS, a C part is a ....

R.S. Sutton. Reinforcement learning architectures for animats. In J.-A. Meyer and S. W. Wilson, editors, From animals to animats: Proceedings of the First International Conference on Simulation of Adaptative Behavior, Cambridge, MA, 1991. MIT Press.


Adding a Generalization Mechanism to YACS - Gérard, Sigaud   (Correct)

....its latent learning capability. A more explicit linkage is used in CXCS [TB00] In contrast with all these approaches, ACS (Anticipatory Classi er System, Sto98] and YACS (Yet Another Classi er System, GSS01] both form C A E 1 classi ers. This formalism is similar to Sutton s DynaQ [Sut91] approach but draws bene ts of the generalization capability of LCSs. ACS and YACS both take advantage of the information provided by the succession of situations in order to drive the classi er discovering process. Therefore, they use heuristics instead of Genetic Algorithms, which are general ....

R.S. Sutton. Reinforcement learning architectures for animats. In J.-A. Meyer and S. W. Wilson, editors, From animals to animats: Proceedings of the First International Conference on Simulation of Adaptative Behavior, Cambridge, MA, 1991. MIT Press.


YACS: a new Learning Classifier System using Anticipation - Gérard, Stolzmann, Sigaud   (1 citation)  (Correct)

....is a further development of the anticipatory behavioral control theory introduced in psychology by Ho mann [Ho mann, 1993] YACS (Yet Another Classi er System) also uses these ideas and both systems form explicit [condition] action] e ect] classi ers. This formalism is similar to Sutton s DynaQ [Sutton, 1991] approach or Drescher s [context] action] result] rules [Drescher, 1991] but with a generalization capability. ACS and YACS both take advantage of the information provided by the succession of situations in order to drive explicitly the classi er discovering process. Therefore, they use ....

....to adjust the qualities for action of each (situation; action) pair. Using one single step of value iteration at each time step allows to nd the optimal policy over several time steps and o ers a good reactivity planning tradeo . Doing so is similar to performing mental actions as in DynaQ [Sutton, 1991]. In this section, we describe how YACS takes advantage of the model of the environment in order to speed up the reinforcement learning process and the computation of a policy. We rst introduce the Value Iteration algorithm and the way YACS identi es the reward sources. Then we explain how the ....

[Article contains additional citation context not shown here]

Sutton, R. (1991). Reinforcement learning architectures for animats. In Meyer, J.-A. and Wilson, S. W., (Eds.), From animals to animats: Proceedings of the First International Conference on Simulation of Adaptative Behavior, Cambridge, MA. MIT Press.


Comparing Robot and Animal Behaviour - Hallam, Hayes (1992)   (5 citations)  (Correct)

.... e.g. a particular light frequency may act as a signal [Mei91] Learning. Considerable research has been carried out into learning in robots by the inclusion of self organising nets e.g. see the review in [NSH90] Other approaches include the reinforcement learning architectures reviewed in [Sut90] and the chaos based learning described in [Ver91] 4 The Proposed Standard From consideration of the factors discussed above it can be seen that robots already possess most of the different types of ability outlined. However there are very few robots which have all the capabilities mentioned. ....

Richard Sutton. Reinforcement learning architectures for animats. In Jean-Arcady Meyer and Stewart Wilson, editors, From Animals to Animats, pages 288--296, 1990.


ARBIB: an Autonomous Robot Based on Inspirations from Biology - Damper, French, Scutt (1999)   (Correct)

....that state is for the agent . Learning then acts to maximize the reward or return associated with or predicted from the reinforcement signal over time. The many variants of this form of learning (especially the Q learning paradigm of Watkins [79, 80] have been very popular in animat studies (e.g. [37, 39, 44, 48, 52, 70]) Reinforcement learning is similar to the S R theory of conditioning [43, p. 350] which posits a direct link between the conditioned stimulus and response, in contrast to Pavlovian S S theory which posits association of two stimuli the CS and the US. Thus, although we have not done so, a ....

R. S. Sutton. Reinforcement learning architectures for animats. In J.-A. Meyer and S. W. Wilson, editors, From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, pages 288--296. Bradford Books/MIT Press, Cambridge, MA, 1991. 43


Introducing a Genetic Generalization Pressure to the.. - Butz, Goldberg.. (2000)   (7 citations)  (Correct)

....a first comparison with the XCS classifier system in different mazes and the multiplexer problem. 1 Introduction The ACS is not the first approach which is forming an complete internal model (i.e. a cognitive map) of the external environment. Reinforcement learning approaches, like Dyna (Sutton, 1991), form an internal model but are not able to generalize and indeed form an identical internal representation of the outside world. Riolo (1991) showed that classifier systems are also able to form a cognitive map and thus are able to learn latently and do lookahead planning. Though, the ....

....map, the ACS generates a cognitive map of the environment. Up to now, the evolving cognitive map in the ACS was not used to increase the learning capabilities of the ACS. For example, this mental representation of the environment could be used to execute hypothetical actions, similar to Dyna (Sutton, 1991). This should result in much less 10 Figure 11: Woods14 especially challenges the bucket brigade algorithm as the reward needs to be back propagated many steps. 0 5 10 15 20 25 30 35 40 45 50 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 steps to food number of exploration ....

Sutton, R. S. (1991). Reinforcement learning architectures for animats. In Meyer, J.-A. (Ed.), Proceedings of the first international conference on simulation of adaptive behavior (pp. 288-- 296).


Investigating Generalization in the Anticipatory.. - Butz, Goldberg, Stolzmann (2000)   (4 citations)  (Correct)

....requiring the construction of an internal representation of the environment. The enhanced LCS structure combined with the anticipatory learning process (ALP) realizes this need in the ACS. However, the ACS is not the first mechanism that is able to form an internal environmental representation. Sutton (1991) introduced the Dyna Q algorithm which uses an environmental representation to speed up the reinforcement learning capabilities. Holland (1990) and Riolo (1991) developed another classifier system which was able to form an environmental representation. Drescher (1991) published a schemata approach ....

Sutton, R. S. (1991). Reinforcement learning architectures for animats. In Meyer, J.-A. (Ed.), Proceedings of the first international conference on simulation of adaptive behavior (pp. 288-- 296).


Introducing a Genetic Generalization Pressure to the.. - Butz, Goldberg.. (2000)   (7 citations)  (Correct)

....map for doing lookahead planning as presented in Stolzmann (1999) or action planning (Butz Stolzmann, 1999) The use of other cognitive processes still needs more investigation. The ACS could e.g. use the cognitive map to learn from hypothetical experiences similar to the Dyna Architecture (Sutton, 1991). The advantage in the ACS is that it is not forming an identical internal image of the environment (like Dyna) but the environmental representation is more generalized. Up to now, the ACS did not use any type of generalization in the learning process but only specified rules from more general ....

Sutton, R. S. (1991). Reinforcement learning architectures for animats. In Meyer, J.-A. (Ed.), Proceedings of the first international conference on simulation of adaptive behavior (pp. 288-- 296).


First Cognitive Capabilities in the Anticipatory.. - Stolzmann, Butz.. (2000)   (2 citations)  (Correct)

....BBA. This is the point where ACS needs to use its representation of the environment in order to realize that the one action in the first phase is superior over the other. We implemented two different processes. 3.3. 1 The One Step Mental Acting Algorithm The first algorithm is derived from Dyna Q (Sutton, 1991). The Dyna approach forms an environmental representation similar to the ACS. However, it does not generalize any rules but simply stores each S R E relation in a table. Additional to the reinforcement learning (Q learning) and model building, the Dyna Q algorithm learns from the execution of ....

Sutton, R. S. (1991). Reinforcement learning architectures for animats. In Meyer, J.-A. (Ed.), Proceedings of the first international conference on simulation of adaptive behavior (pp. 288-- 296).


"Programming" By Teaching: Neural Network Control In The.. - Martin, Nehmzow   (Correct)

....control are but three examples that come to mind. If, on the other hand, robots are to be used in unstructured environments, if models of robot and environment are only partially or not at all available, reinforcement learning techniques provide a viable, if not necessary alternative ( Barto 89, Sutton 91] FortyTwo, the Manchester mobile robot, has successfully demonstrated that such techniques enable robots to learn from reinforcement, successfully and in real time ( Nehmzow 94, Nehmzow 95] Published at Intern. Conf. Intelligent Autonomous Vehicles 95, Helsinki. Names appear in alphabetical ....

R. Sutton, Reinforcement Learning Architectures for Animats, in J.A. Meyer and S. Wilson (eds), From Animals to Animats, MIT Press 1991.


Flexible Control of Mobile Robots through Autonomous Competence.. - Nehmzow (1995)   (3 citations)  (Correct)

....mapping between sensor signals and actuator response. The term reinforcement learning applies to all learning techniques that improve plant performance by learning through trial and error, using performance feedback only (i.e. feedback evaluating behaviour, but not indicating correct behaviour ( Sutton 91] Reinforcement learning techniques have been used in robot control, for instance to acquire a box pushing behaviour (Mahadevan and Connell s work using Q learning and statistical clustering ( Mahadevan Connell 91] or phototaxis ( Kaelbling 92] Millan uses connectionist reinforcement ....

R. Sutton, Reinforcement Learning Architectures for Animats, in J.A. Meyer and S. Wilson (eds), From Animals to Animats, MIT Press 1991.


A Hierarchical Concept Learner - Matthew Mitchell June (1998)   (Correct)

No context found.

Sutton, R. (1991). Reinforcement learning architectures for animats. In J. Meyer and S. Wilson (Eds.), From Animals to Animats, pp. 288--296.


An Evaluation of TRACA's Generalisation Performance - Matthew Mitchell School (2002)   (Correct)

No context found.

Sutton, R. (1991). Reinforcement learning architectures for animats. In J. Meyer and S. Wilson (Eds.), From Animals to Animats, pp. 288--296.


Intelligent Virtual Environment and Camera Control in.. - Ferreira, Gelatti, Musse (2002)   (Correct)

No context found.

R. S. Sutton. "Reinforcement Learning Architectures for Animats." Computer Graphics, 18, (3),1--10, (July 1984).


An Architecture for Behavior-Based Reinforcement Learning - Konidaris, Hayes (2005)   (Correct)

No context found.

Sutton, R. (1990). Reinforcement learning architectures for animats. In J. Meyer, & S. Wilson (Eds.), From Animals to Animats: Proceedings of the International Conference on Simulation of Adaptive Behavior. MIT Press.


Behaviour-Based Reinforcement Learning - Konidaris (2003)   (Correct)

No context found.

Sutton, R. (1990). Reinforcement learning architectures for animats. In J. Meyer and S. Wilson, editors, From Animals to Animats: Proceedings of the International Conference on Simulation of Adaptive Behavior.


Anticipatory Learning for Focusing Search in Reinforcement.. - Konidaris, Hayes (2004)   (Correct)

No context found.

R.S. Sutton. Reinforcement learning architectures for animats. In J. Meyer and S.W. Wilson, editors, From Animals to Animats: Proceedings of the International Conference on Simulation of Adaptive Behavior. MIT Press, 1990.


Estimating Future Reward in Reinforcement Learning Animats.. - Konidaris, Hayes (2004)   (Correct)

No context found.

Sutton, R. (1990). Reinforcement learning architectures for animats. In J. Meyer and S. Wilson, editors, From Animals to Animats: Proceedings of the International Conference on Simulation of Adaptive Behavior . MIT Press.


Adaptive Agents for Information Gathering from Multiple, .. - Information Sources.. (1999)   (Correct)

No context found.

Rich Sutton, "Reinforcement Learning Architectures for Animats," in From Animals to Animats, Proceedings of the First International Conference on Simulation of Adaptive Behavior, edited by J.-A. Meyer & S. W. Wilson, MIT Press/Bradford Books, 1991.


Paying Attention to What's Important: Using Focus of Attention.. - Foner, Maes (1994)   (9 citations)  (Correct)

No context found.

Rich S. Sutton, "Reinforcement Learning Architectures for Animats," in From Animals to Animats, Proceedings of the First International Conference on the Simulation of Adaptive Behavior, edited by Meyer J.-A. & Wilson S.W., MIT Press/ Bradford Books, 1991.


The Anticipatory Classifier System and Genetic Generalization - Butz, Goldberg, Stolzmann (2000)   (Correct)

No context found.

Sutton, R. S. (1991b). Reinforcement learning architectures for animats. In Meyer, J.-A., & Wilson, S. W. (Eds.), From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior pp. 288--296. Cambridge, MA: MIT Press.


The Artificial Life Roots of Artificial Intelligence - Steels (1993)   (59 citations)  (Correct)

No context found.

Sutton, R.S. (1991) Reinforcement Learning Architectures for Animats. In: Meyer, J-A., and S.W. Wilson (1991) From Animals to Animats. Proceedings of the First International Conference on Simulation of Adaptive Behavior. MIT Press/Bradford Books. Cambridge Ma. p. 288-296.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC