| Ormoneit, D., and Sen, S. 1999. Kernel-based reinforcement learning. Technical Report 1999-8, Department of Statistics, Stanford University. |
....selection and approximate policy iteration entirely in value function space, without any need for model. In contrast to other approaches to approximate policy iteration, it does not require any sort of approximate policy function. In comparison to the memory based approach of Ormoneit and Sen [11], our method makes stronger use of function approximation. Rather than using our samples to implicitly construct an approximate model using kernels, we operate entirely in value function space and use our samples directly in the value function projection step. As noted by Boyan [3] the A matrix ....
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. To appear, Machine Learning, 2001.
....certain kernel based learners (such as RBF methods with xed centres and basis widths) piece wise and barycentric interpolation, and table lookup. All of these methods di er only by their choice of input mapping, which is often normalised. Many of these methods are already employed in RL (see [12, 13, 8, 6, 7] for recent examples) Some work draws distinctions between parameterised linear neural networks performing gradient descent and kernel based averaging. This distinction seems to be unjusti ed as the methods can be seen to be the same except for their choice of cost function and normalisation ....
.... policies and state aggregation representations [12] valueiteration where the function approximator update can shown to be a non expansion [4] Q learning with stationary exploration policies and kernel based averaging [19] and some valueiteration based adaptive linear representations [6, 7] (these follow from [4] The value iteration based methods assume that a model of the environment is available, and are also deterministic algorithms and so easier to analyse. Figure 1 (adapted from an example in [18] compares updates (1) and (2) in a standard supervised learning setting. In ....
[Article contains additional citation context not shown here]
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 42:241{ 267, 2001.
....approaches to approximate policy iteration, it does not require any sort of approximate policy function. It is natural to compare our algorithm to other memory based methods as well as the recent flourish of policy search methods, 8, 7, 1, 14, 6] In comparison to the memory based approach of [10], our method makes stronger use of function approximation. Rather than using our samples to implicitly construct an approximate model using kernels, we operate entirely in value function space and use our samples directly in the value function projection step. As noted in [3] the A matrix used by ....
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. To appear, Machine Learning, 2001.
....and uses Monte Carlo sampling to approximate the KL divergence between them. This algorithm is fairly generic an extension of nearest neighbors to function approximation in density space, where densities are represented by samples. Space limitations preclude us from providing further detail (see [11, 12]) 3 Experimental Results Preliminary results have been obtained in a world shown in two domains, one synthetic and one using a simulator of a RWI B21 robot. In the synthetic environment (Figure 2a) the agents starts at the lower left corner. Its objective is to reach heaven which is either ....
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. TR 1999-8, Statistics, Stanford University, 1999.
No context found.
D. Ormoneit and ' S. Sen, "Kernel-based reinforcement learning," Machine Learning, 2001.
....kernelbased reinforcement learning only relies on sample trajectories of the MDP and it is therefore much more widely applicable in practice. While our method addresses both discounted and average cost problems, we focus on average costs here and refer the reader interested in discounted costs to [OS00]. For brevity, we also defer technical details and proofs to an accompanying paper [OG00] Note that averagecost reinforcement learning has been discussed by several authors (e.g. TR99] The remainder of this work is organized as follows. In Section 2 be provide basic definitions and we ....
....(i; j) 1 we can further exploit the analogy between (9) and the usual value iteration in an artificial MDP with transition probabilities Phi a to prove the following theorem: Theorem 1 The relative value iteration (9) converges to a unique fixed point. For details, the reader is referred to [OS00, OG00]. Note that Theorem 1 illustrates a rather unique property of kernel based reinforcement learning by comparison to alternative approaches. In addition, we can show that under suitable regularity conditions kernel based reinforcement learning is consistent in the following sense: Theorem 2 ....
D. Ormoneit and ' S. Sen. Kernel-based reinforcement learning. Machine Learning, 2001. To appear.
No context found.
Ormoneit, D., and Sen, S. 1999. Kernel-based reinforcement learning. Technical Report 1999-8, Department of Statistics, Stanford University.
No context found.
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. To appear, Machine Learning, 2001.
No context found.
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49:161--178, 2002.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC