7 citations found. Retrieving documents...
P. Cichosz, 1994, Reinforcement learning algorithms based on the methods of temporal di erences, Masters Thesis, Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Improving The Performance Of Q-Learning With Locally Weighted.. - Aljibury (2001)   (Correct)

....the previous value, then you have a value function that has converged as close as desired to the optimal value function V (s) At this point, the optimal policy can be easily calculated from Eq. 2 15) Conversely, an optimal policy can be iteratively obtained from an initial arbitrary policy p(s) [3]. Repeat: Let S s s s = p p (2 19) Evaluate Vp(s) s s V s a s T a s R a s Q p p (2 20) Improve the policy p(s) max arg ) a s Q s a p p = 2 21) until the improvement from p to p is epsilon small. At this point, ....

P. Cichosz, Reinforcement Learning Algorithms Based on the Methods of Temporal Differences. Master's Thesis, Institute of Computer Science, Warsaw University of Technology, 1994.


The Society of Mind Requires an Economy of Mind - Wright (1997)   (1 citation)  (Correct)

....The main design problem to be solved in reinforcement learning is the credit assignment problem, which is the problem of properly assigning credit or blame for overall outcomes to each of the learning system s internal decisions that contributed to those outcomes (R. S. Sutton, quoted in (Cichosz, 1994)) More precisely, RL involves learning functions defined on the state and action space of a task, driven by a real valued reinforcement signal. The details of how this is achieved depends on the particular function representation used. Examples of RL algorithms are Q learning (Watkins Dayan, ....

Cichosz, P. (1994). Reinforcement learning algorithms based on the methods of temporal differences. Master's thesis, Institute of Computer Science, Warsaw University of Technology.


Truncating Temporal Differences: On the Efficient Implementation.. - Cichosz (1995)   (15 citations)  Self-citation (Cichosz)   (Correct)

....suggesting that using 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning. 1. Introduction Reinforcement learning (RL, e.g. Sutton, 1984; Watkins, 1989; Barto, 1992; Sutton, Barto, Williams, 1991; Lin, 1992, 1993; Cichosz, 1994) is a machine learning paradigm that relies on evaluative training information. At each step of discrete time a learning agent observes the current state of its environment and executes an action. Then it receives a reinforcement value, also called a payoff or a reward (punishment) and a state ....

....to TD(0) will be presented and related to Equation 6. The presentation below is limited solely to function update rules for a more elaborated description of the algorithms the reader should consult the original publications of their developers or, for AHC and Q learning, Lin (1993) or Cichosz (1994). They are all closely related to dynamic programming methods (Barto, Sutton, Watkins, 1990; Watkins, 1989; Baird, 1993) but these relations, though theoretically and practically important and fruitful, are not essential for the subject of this paper and will not be discussed. 2.2.1 The AHC ....

Cichosz, P. (1994). Reinforcement learning algorithms based on the methods of temporal differences. Master's thesis, Institute of Computer Science, Warsaw University of Technology.


GBQL: A Novel Genetics-Based Reinforcement Learning Architecture - Cichosz, Mulawka   Self-citation (Cichosz)   (Correct)

....ones. The implicit bucket brigade algorithm (Wilson, 1985) used for credit assignment in stimulus response classifier systems may be shown to be closely related to the form of TD methods used for reinforcement learning. These relations are discussed, e.g. by Dorigo and Bersini (1994) or Cichosz (1994). This paper presents an attempt to integrate TD based temporal credit assignment with genetics based knowledge representation, and thus put some links across the gap between these two related areas. The main reasons why it may be a promising undertaking are based on some potential advantages of ....

....and have a better developed theory, ffl TD based algorithms are related to dynamic programming and its strong theoretical foundations. These observations, as well as some other drawbacks of stimulus response classifier systems described in the literature (e.g. Wilson Goldberg, 1989; Cichosz, 1994), provide a sufficient justification for this work. 2 Q Learning Q learning (Watkins, 1989) chosen as a basis for the proposed novel architecture, is currently the theoretically strongest and most widely used TD based (DP related) reinforcement learning algorithm. It learns a Q function, which ....

[Article contains additional citation context not shown here]

Cichosz, P. (1994). Reinforcement learning algorithms based on the methods of temporal differences. Master's thesis, Institute of Computer Science, Warsaw University of Technology.


Learning Multidimensional Control Actions From Delayed.. - Cichosz (1995)   (3 citations)  Self-citation (Cichosz)   (Correct)

....tasks with vector actions is proposed, called Q V learning. Experimental results are presented where this algorithm clearly outperforms the simple recoding approach, while it is associated with a much lower computational expense. INTRODUCTION The basic scenario of the reinforcement learning (RL) [8, 11, 4, 1] paradigm is as follows. At each time step the learning agent observes the current state of its environment and performs an action. Then it receives a reinforcement value, also called a payoff or a reward, and a state transition takes place. State transitions and reinforcement values may be, in ....

....tasks. In their approach there are N Q functions, as in Q V learning, but each of them is updated independently by the standard Q learning algorithm. In a forthcoming paper [3] we argue that it is likely to be less effective than Q V learning and compare these two algorithms experimentally. See [1] for a more elaborated overview of related research. DEMONSTRATIONS This section presents experimental results for a learning control problem which is an extended version of the car parking problem used in [2] for evaluating the performance of the TTD procedure. Car Parking Task The car parking ....

[Article contains additional citation context not shown here]

P. Cichosz. Reinforcement learning algorithms based on the methods of temporal differences. Master's thesis, Institute of Computer Science, Warsaw University of Technology, September 1994.


Faster Temporal Credit Assignment in Learning Classifier Systems - Cichosz, Mulawka   Self-citation (Cichosz)   (Correct)

....by a recency factor 2 [0; 1] which is written TD( Its reinforcement learning version will be described in Section 2.2. Some close relationships between TD(0) based algorithms and the bucket brigade algorithm have been observed by several authors (e.g. Dorigo Bersini, 1994; Wilson, 1994; Cichosz, 1994), but, strangely enough, these observations have not led to clear practical conclusions. To draw such conclusions, in this paper we port to the context of classifier systems the idea of TD( for general . Using positive has been found to usually yield a considerable learning speedup for AHC and ....

.... t ) fi[r t flU t (x t 1 ) Gamma U t (x t ) 13) the only difference is that U t (x t ) replaces s t (A t ) and U t (x t 1 ) replaces s t 1 (A t 1 ) This basic observation is nothing original it has been made before several times, e.g. by Dorigo and Bersini (1994) Wilson (1994) or Cichosz (1994). The analogy between the bucket brigade and TD(0) would be perfect if the former used s t (A t 1 ) instead of s t 1 (A t 1 ) in Equation 2. However, the difference should not be very significant in practice, especially for relatively small fi, and we feel free to ignore it. Consequently, we may ....

Cichosz, P. (1994). Reinforcement learning algorithms based on the methods of temporal differences. Master's thesis, Institute of Computer Science, Warsaw University of Technology.


Solution of Delayed Reinforcement Learning Problems Having.. - Ravindran (1996)   (Correct)

No context found.

P. Cichosz, 1994, Reinforcement learning algorithms based on the methods of temporal di erences, Masters Thesis, Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC