| P. Tadepalli and D. Ok, Model--Based Average Reward Reinforcement Learning. Artificial Intelligence, 100:177--224, 1998. |
.... e.g. Kaelbling s interval exploration method [ Kaelbling, 1993 ] the exploration bonus in Dyna [ Sutton, 1990 ] the curiosity driven exploration of [ Schmidhuber, 1991 ] and the exploration mechanism in prioritized sweeping [ Moore and Atkenson, 1993 ] More recently, Tadepalli and Ok [ Tadepalli and Ok, 1998 ] presented a reinforcement learning algorithm that works in the context of the undiscounted average reward model used in this paper. In particular, one variant of their algorithm, called AH learning, is very similar to R max. However, as we noted above, none of this work provides theoretical ....
P. Tadepalli and D. Ok. Model-based average reward reinforcement learning. Artificial Intelligence, 100:177--224, 1998.
....known. Time can be either discrete or continuous. Interestingly, the nice properties of RL methods seem to transfer to these more complicated setups. Of course, in these applications the basic methods are extended with some tools. Some of them are listed next. Factored representations (see e.g. [16]) Often a priori knowledge is available about the independence of one part of the description of the state from another part. The state of an object distant from the robot does not (usually) change as a result of some action performed by the robot. Factored representations can be exploited e.g. ....
P. Tadepalli and D. Ok. Model-based average reward reinforcement learning. Artificial Intelligence, 1998. to appear.
....reward problem. While there has been an emerging literature on the development and deployment of average reward reinforcement learning algorithms (see, e.g. Abounadi, 1998; Mahadevan, 1996; Marbach, Mihatsch, and Tsitsiklis, 1998; Schwartz, 1993; Singh, 1994; Tsitsiklis and Van Roy, 1997b; Tadepalli and Ok, 1998; Van Roy, 1998) the picture is far from complete as only some of these methods can be adequately justified, mostly for the case of lookup table representations. It is natural to inquire whether the practice of avoiding average reward formulations is necessary and or sound. Our earlier work ....
.... as the discount factor approaches 1 Could there be an inherent advantage to average reward TD These questions have been prompted to a large extent by available experience, which suggests that average reward methods may offer computational advantages (Marbach, Mihatsch, and Tsitsiklis, 1998; Tadepalli and Ok, 1998). The analysis provided in the remainder of the paper indicates that discounted TD (with a discount factor approaching 1) is essentially identical to average reward TD. The two methods converge to the same limit. Furthermore, as long as the constant function which is typically used as one of ....
P. Tadepalli & D. Ok (1998) "Model--Based Average Reward Reinforcement Learning," to appear in Artificial Intelligence, 1998.
No context found.
P. Tadepalli and D. Ok, Model--Based Average Reward Reinforcement Learning. Artificial Intelligence, 100:177--224, 1998.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC