| Benjamin Van Roy. Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, Massachusetts Institute of Technology, 1998. |
....value function rather than calculate it exactly. Numerous schemes have been investigated for approximating optimal value functions and policies in a compact representational framework, including: hierarchical decompositions [5] decision trees and diagrams [3, 12] generalized linear functions [1, 13, 4, 7, 8, 6], neural networks [2] and products of experts [11] However, the simplest of these is generalized linear functions, which is the form we investigate below. In this case, we consider functions of the form f(x) P k w j b j (x j ) where b 1 ; b k are a xed set of basis functions, and x j ....
....solutions to MDPs with up to n = 40 state variables (1 trillion states) to be generated in under 7. 5 hours using approximate policy iteration [6] It turns out that approximate value iteration is less e ective because it takes more iterations to converge, and in fact can diverge in theory [6, 13]. Our main observation is that if one has to solve linear programs to conduct the approximate iterations anyway, then it might be much simpler and more ecient to approximate the linear programming approach directly. 4 Approximate linear programming Our rst idea is simply to observe that a ....
B. Van Roy. Learning and value function approximation in complex decision processes. PhD thesis, MIT, EECS, 1998.
.... reinforcement learning has consistently focused on developing algorithms that effectively manage the exploration exploitation tradeoff [1, 11] and attempting to scale up to large real world problems [8, 7] Most research in these directions has focused on the direct value approximation approach [4, 5, 25] rather than the indirect model based approach, because the model based approach has not been amenable to scaling up to large problems [24] However, two recent developments in model based reinforcement learning and Markov decision process (MDP) planning have created new opportunities for ....
B. Van Roy. Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, M.I.T, 1998.
....value function rather than calculate it exactly. Numerous schemes have been investigated for approximating optimal value functions and policies in a compact representational framework, including: hierarchical decompositions [5] decision trees and diagrams [3, 12] generalized linear functions [1, 13, 4, 7, 8, 6], neural networks [2] and products of experts [11] However, the simplest of these is generalized linear functions, which is the form we investigate below. In this case, we consider functions of the form f(x) P k j=1 w j b j (x j ) where b 1 ; b k are a xed set of basis functions, and ....
....solutions to MDPs with up to n = 40 state variables (1 trillion states) to be generated in under 7. 5 hours using approximate policy iteration [6] 1 1 It turns out that approximate value iteration is less e ective because it takes more iterations to converge, and in fact can diverge in theory [6, 13]. Our main observation is that if one has to solve linear programs to conduct the approximate iterations anyway, then it might be much simpler and more ecient to approximate the linear programming approach directly. 4 Approximate linear programming Our rst idea is simply to observe that a ....
B. Van Roy. Learning and value function approximation in complex decision processes. PhD thesis, MIT, EECS, 1998.
....examples are given where the use of RL with function approximators can fail. In [70] the same examples were used and the experimental setup was modified to make them work. Proofs of convergence for function approximation only exist for approximators linear in the weight when applied to MDPs [77][79]. For systems with a continuous state space there are no proofs when general function approximators are used. Also there are no general recipes to make the use of function approximators successful. It still might require 5 Note that it is also possible to have a critic with only state x as ....
B. Van Roy. Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, Massachussets Intitute of Technology, 1998.
....value function rather than calculate it exactly. Numerous schemes have been investigated for approximating optimal value functions and policies in a compact representational framework, including: hierarchical decompositions [5] decision trees and diagrams [3, 12] generalized linear functions [1, 13, 4, 7, 8, 6], neural networks [2] and products of experts [11] However, the simplest of these is generalized linear functions, which is the form we investigate below. In this case, we consider functions of the form f(x) P k j=1 w j b j (x j ) where b 1 ; b k are a xed set of basis functions, and ....
....well known [1, 2] that this yields a linear program in the basis weights w. However, what had not been previously shown is that given a factored MDP, an 1 It turns out that approximate value iteration is less e ective because it takes more iterations to converge, and in fact can diverge in theory [6, 13]. equivalent linear program of feasible size could be formulated. Given the results of [6] outlined above this is now easy to do. First, one can show that the minimization objective can be encoded compactly: P x f(x) P x P k j=1 w j b j (x j ) P k j=1 w j y j where y j = 2 njx j j ....
B. Van Roy. Learning and value function approximation in complex decision processes. PhD thesis, MIT, EECS, 1998.
....the introduction, the key idea behind our approach is the restriction of our algorithms to the use of value functions in a limited class. This idea is best known under the name value function approximation, which is used frequently in the context or reinforcement learning [Tadepalli and Ok, 1996; Van Roy, 1998] We use this idea in the context of maintaining full value functions and propagating them through the DP equation (1) Gordon, 1995; Tsitsiklis and Van Roy, 1996] However, unlike other methods, which deal with large state spaces by considering only a restricted set of representative states, ....
....a weight vector w = A 1 w 0 , we know that P m j=1 w j h j is the least d projection of V into V [Strang, 1980] We define P to be the operator that projects into V . The contraction of the T operator in d distance is established in Nelson [1958] and is combined with projection in Van Roy [1998], yielding: Theorem 3.1: a) The operator T is a contraction in d distance with rate 1; hence, T has a unique fixed point V . b) Let T = P T . Then T is a contraction in d distance with rate ; hence, T has a unique fixed point V . Furthermore, V ....
B. Van Roy. Learning and Value Function Approximation in Complex Decision Problems. PhD thesis, Massachusetts Institute of Technology, 1998.
No context found.
B. Van Roy, "Learning and Value Function Approximation in Complex Decision Processes," Ph.D. dissertation, Massachusetts Inst. Technol., Cambridge, 1998.
....be uniformly bounded in the number of states and state variables in certain queueing problems. Our analysis also led to some guidelines in the choice of the so called state relevance weights for the approximate LP. An alternative to the approximate LP are temporal di#erence learning (TD) methods [2, 6, 7, 19, 21, 22, 23]. In such methods, one tries to find a fixed point for an approximate dynamic programming operator by simulating the system and learning from the observed costs and state transitions. Experimentation is necessary to determine when TD can o#er better results than the approximate LP. However, it ....
Van Roy, B., Learning and Value Function Approximation in Complex Decision Processes, Ph.D. Thesis, Massachusetts Institute of Technology, May 1998. 30
No context found.
Benjamin Van Roy. Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, Massachusetts Institute of Technology, 1998.
No context found.
B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, MIT, 1998.
No context found.
B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, Massachussetts Institute of Technology, 1998.
No context found.
B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, MIT, 1998.
No context found.
B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, Massachussetts Institute of Technology, 1998.
No context found.
B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, MIT, 1998.
No context found.
B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, MIT, 1998.
No context found.
Benjamin Van Roy, Learning and value function approximation in complex decision processes, Ph.D. thesis, MIT, Cambridge, MA, 1998.
No context found.
B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, MIT, 1998.
No context found.
B. Van Roy. Learning and Value Function Approximation in Complex Decision Problems. PhD thesis, Massachusetts Institute of Technology, 1998.
No context found.
B. Van Roy, Learning and value function approximation in complex decision processes, Ph.D. Thesis, MIT (1998).
No context found.
Van Roy, B., Learning and Value Function Approximation in Complex Decision Processes, Ph.D. Thesis, Massachusetts Institute of Technology, May 1998. 30 This score is computed based on how much the agents have reduced the city damage. J. Habibi et al.
No context found.
B. Van Roy, Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 1998.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC