| Csaba Szepesvari. Convergent reinforcement learning with value function interpolation. Technical Report TR-2001-02, Mindmaker Ltd., Budapest 1121, Konkoly Th. M. u. 29-33, HUNGARY, 2001. |
.... theories exist are, Q learning and TD(0) with stationary exploration policies and state aggregation representations [12] valueiteration where the function approximator update can shown to be a non expansion [4] Q learning with stationary exploration policies and kernel based averaging [19], and some valueiteration based adaptive linear representations [6, 7] these follow from [4] The value iteration based methods assume that a model of the environment is available, and are also deterministic algorithms and so easier to analyse. Figure 1 (adapted from an example in [18] ....
Csaba Szepesvari. Convergent reinforcement learning with value function interpolation. Technical Report TR-2001-02, Mindmaker Ltd., Budapest 1121, Konkoly Th. M. u. 29-33, HUNGARY, 2001.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC