4 citations found. Retrieving documents...
P. Tadepalli and D. Ok, Model--Based Average Reward Reinforcement Learning. Artificial Intelligence, 100:177--224, 1998.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
R-MAX - A General Polynomial Time Algorithm for.. - Brafman, Tennenholtz (2001)   (1 citation)  (Correct)

.... e.g. Kaelbling s interval exploration method [ Kaelbling, 1993 ] the exploration bonus in Dyna [ Sutton, 1990 ] the curiosity driven exploration of [ Schmidhuber, 1991 ] and the exploration mechanism in prioritized sweeping [ Moore and Atkenson, 1993 ] More recently, Tadepalli and Ok [ Tadepalli and Ok, 1998 ] presented a reinforcement learning algorithm that works in the context of the undiscounted average reward model used in this paper. In particular, one variant of their algorithm, called AH learning, is very similar to R max. However, as we noted above, none of this work provides theoretical ....

P. Tadepalli and D. Ok. Model-based average reward reinforcement learning. Artificial Intelligence, 100:177--224, 1998.


Reinforcement Learning: Theory and Practice - Szepesvári   (Correct)

....known. Time can be either discrete or continuous. Interestingly, the nice properties of RL methods seem to transfer to these more complicated setups. Of course, in these applications the basic methods are extended with some tools. Some of them are listed next. Factored representations (see e.g. [16]) Often a priori knowledge is available about the independence of one part of the description of the state from another part. The state of an object distant from the robot does not (usually) change as a result of some action performed by the robot. Factored representations can be exploited e.g. ....

P. Tadepalli and D. Ok. Model-based average reward reinforcement learning. Artificial Intelligence, 1998. to appear.


On Average Versus Discounted Reward Temporal-Difference.. - Tsitsiklis, Van Roy (1999)   (1 citation)  (Correct)

....reward problem. While there has been an emerging literature on the development and deployment of average reward reinforcement learning algorithms (see, e.g. Abounadi, 1998; Mahadevan, 1996; Marbach, Mihatsch, and Tsitsiklis, 1998; Schwartz, 1993; Singh, 1994; Tsitsiklis and Van Roy, 1997b; Tadepalli and Ok, 1998; Van Roy, 1998) the picture is far from complete as only some of these methods can be adequately justified, mostly for the case of lookup table representations. It is natural to inquire whether the practice of avoiding average reward formulations is necessary and or sound. Our earlier work ....

.... as the discount factor approaches 1 Could there be an inherent advantage to average reward TD These questions have been prompted to a large extent by available experience, which suggests that average reward methods may offer computational advantages (Marbach, Mihatsch, and Tsitsiklis, 1998; Tadepalli and Ok, 1998). The analysis provided in the remainder of the paper indicates that discounted TD (with a discount factor approaching 1) is essentially identical to average reward TD. The two methods converge to the same limit. Furthermore, as long as the constant function which is typically used as one of ....

P. Tadepalli & D. Ok (1998) "Model--Based Average Reward Reinforcement Learning," to appear in Artificial Intelligence, 1998.


Title of the Book! - Name Of Author   (Correct)

No context found.

P. Tadepalli and D. Ok, Model--Based Average Reward Reinforcement Learning. Artificial Intelligence, 100:177--224, 1998.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC