21 citations found. Retrieving documents...
Benjamin Van Roy. Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, Massachusetts Institute of Technology, 1998.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Direct value-approximation for factored MDPs - Schuurmans, Patrascu (2001)   (14 citations)  (Correct)

....value function rather than calculate it exactly. Numerous schemes have been investigated for approximating optimal value functions and policies in a compact representational framework, including: hierarchical decompositions [5] decision trees and diagrams [3, 12] generalized linear functions [1, 13, 4, 7, 8, 6], neural networks [2] and products of experts [11] However, the simplest of these is generalized linear functions, which is the form we investigate below. In this case, we consider functions of the form f(x) P k w j b j (x j ) where b 1 ; b k are a xed set of basis functions, and x j ....

....solutions to MDPs with up to n = 40 state variables (1 trillion states) to be generated in under 7. 5 hours using approximate policy iteration [6] It turns out that approximate value iteration is less e ective because it takes more iterations to converge, and in fact can diverge in theory [6, 13]. Our main observation is that if one has to solve linear programs to conduct the approximate iterations anyway, then it might be much simpler and more ecient to approximate the linear programming approach directly. 4 Approximate linear programming Our rst idea is simply to observe that a ....

B. Van Roy. Learning and value function approximation in complex decision processes. PhD thesis, MIT, EECS, 1998.


Algorithm-Directed Exploration for Model-Based.. - Guestrin, Patrascu.. (2002)   (1 citation)  (Correct)

.... reinforcement learning has consistently focused on developing algorithms that effectively manage the exploration exploitation tradeoff [1, 11] and attempting to scale up to large real world problems [8, 7] Most research in these directions has focused on the direct value approximation approach [4, 5, 25] rather than the indirect model based approach, because the model based approach has not been amenable to scaling up to large problems [24] However, two recent developments in model based reinforcement learning and Markov decision process (MDP) planning have created new opportunities for ....

B. Van Roy. Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, M.I.T, 1998.


Direct value-approximation for factored MDPs - Schuurmans, Patrascu (2001)   (14 citations)  (Correct)

....value function rather than calculate it exactly. Numerous schemes have been investigated for approximating optimal value functions and policies in a compact representational framework, including: hierarchical decompositions [5] decision trees and diagrams [3, 12] generalized linear functions [1, 13, 4, 7, 8, 6], neural networks [2] and products of experts [11] However, the simplest of these is generalized linear functions, which is the form we investigate below. In this case, we consider functions of the form f(x) P k j=1 w j b j (x j ) where b 1 ; b k are a xed set of basis functions, and ....

....solutions to MDPs with up to n = 40 state variables (1 trillion states) to be generated in under 7. 5 hours using approximate policy iteration [6] 1 1 It turns out that approximate value iteration is less e ective because it takes more iterations to converge, and in fact can diverge in theory [6, 13]. Our main observation is that if one has to solve linear programs to conduct the approximate iterations anyway, then it might be much simpler and more ecient to approximate the linear programming approach directly. 4 Approximate linear programming Our rst idea is simply to observe that a ....

B. Van Roy. Learning and value function approximation in complex decision processes. PhD thesis, MIT, EECS, 1998.


Continuous State Space Q-Learning for Control of Nonlinear Systems - Hagen (2001)   (3 citations)  (Correct)

....examples are given where the use of RL with function approximators can fail. In [70] the same examples were used and the experimental setup was modified to make them work. Proofs of convergence for function approximation only exist for approximators linear in the weight when applied to MDPs [77][79]. For systems with a continuous state space there are no proofs when general function approximators are used. Also there are no general recipes to make the use of function approximators successful. It still might require 5 Note that it is also possible to have a critic with only state x as ....

B. Van Roy. Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, Massachussets Intitute of Technology, 1998.


Direct value-approximation for factored MDPs - Schuurmans, Patrascu (2001)   (14 citations)  (Correct)

....value function rather than calculate it exactly. Numerous schemes have been investigated for approximating optimal value functions and policies in a compact representational framework, including: hierarchical decompositions [5] decision trees and diagrams [3, 12] generalized linear functions [1, 13, 4, 7, 8, 6], neural networks [2] and products of experts [11] However, the simplest of these is generalized linear functions, which is the form we investigate below. In this case, we consider functions of the form f(x) P k j=1 w j b j (x j ) where b 1 ; b k are a xed set of basis functions, and ....

....well known [1, 2] that this yields a linear program in the basis weights w. However, what had not been previously shown is that given a factored MDP, an 1 It turns out that approximate value iteration is less e ective because it takes more iterations to converge, and in fact can diverge in theory [6, 13]. equivalent linear program of feasible size could be formulated. Given the results of [6] outlined above this is now easy to do. First, one can show that the minimization objective can be encoded compactly: P x f(x) P x P k j=1 w j b j (x j ) P k j=1 w j y j where y j = 2 njx j j ....

B. Van Roy. Learning and value function approximation in complex decision processes. PhD thesis, MIT, EECS, 1998.


Computing factored value functions for policies in.. - Daphne Koller Computer (1999)   (26 citations)  (Correct)

....the introduction, the key idea behind our approach is the restriction of our algorithms to the use of value functions in a limited class. This idea is best known under the name value function approximation, which is used frequently in the context or reinforcement learning [Tadepalli and Ok, 1996; Van Roy, 1998] We use this idea in the context of maintaining full value functions and propagating them through the DP equation (1) Gordon, 1995; Tsitsiklis and Van Roy, 1996] However, unlike other methods, which deal with large state spaces by considering only a restricted set of representative states, ....

....a weight vector w = A 1 w 0 , we know that P m j=1 w j h j is the least d projection of V into V [Strang, 1980] We define P to be the operator that projects into V . The contraction of the T operator in d distance is established in Nelson [1958] and is combined with projection in Van Roy [1998], yielding: Theorem 3.1: a) The operator T is a contraction in d distance with rate 1; hence, T has a unique fixed point V . b) Let T = P T . Then T is a contraction in d distance with rate ; hence, T has a unique fixed point V . Furthermore, V ....

B. Van Roy. Learning and Value Function Approximation in Complex Decision Problems. PhD thesis, Massachusetts Institute of Technology, 1998.


Regression Methods for Pricing Complex - American-Style Options John   Self-citation (Van roy)   (Correct)

No context found.

B. Van Roy, "Learning and Value Function Approximation in Complex Decision Processes," Ph.D. dissertation, Massachusetts Inst. Technol., Cambridge, 1998.


The Linear Programming Approach to Approximate Dynamic.. - de Farias, Van Roy (2001)   (18 citations)  Self-citation (Van roy)   (Correct)

....be uniformly bounded in the number of states and state variables in certain queueing problems. Our analysis also led to some guidelines in the choice of the so called state relevance weights for the approximate LP. An alternative to the approximate LP are temporal di#erence learning (TD) methods [2, 6, 7, 19, 21, 22, 23]. In such methods, one tries to find a fixed point for an approximate dynamic programming operator by simulating the system and learning from the observed costs and state transitions. Experimentation is necessary to determine when TD can o#er better results than the approximate LP. However, it ....

Van Roy, B., Learning and Value Function Approximation in Complex Decision Processes, Ph.D. Thesis, Massachusetts Institute of Technology, May 1998. 30


PEGASUS: A policy search method for large MDPs and POMDPs - Andrew Ng Uc (2000)   (35 citations)  (Correct)

No context found.

Benjamin Van Roy. Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, Massachusetts Institute of Technology, 1998.


Appeared in the Twentieth Conference on Uncertainty in.. - Solving Factored Mdps (2004)   (Correct)

No context found.

B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, MIT, 1998.


Linear Program Approximations for Factored Continuous-State .. - Hauskrecht, Kveton (2003)   (Correct)

No context found.

B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, Massachussetts Institute of Technology, 1998.


Solving Factored MDPs with Continuous and Discrete Variables - Guestrin, Hauskrecht.. (2004)   (Correct)

No context found.

B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, MIT, 1998.


Linear Program Approximations for Factored Continuous-State .. - Hauskrecht, Kveton (2003)   (Correct)

No context found.

B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, Massachussetts Institute of Technology, 1998.


Solving Factored MDPs with Continuous and Discrete Variables - Guestrin, Hauskrecht.. (2004)   (Correct)

No context found.

B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, MIT, 1998.


Solving Factored MDPs with Continuous and Discrete Variables - Guestrin, Hauskrecht.. (2004)   (Correct)

No context found.

B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, MIT, 1998.


Reinforcement Learning by Policy Search - Peshkin (2001)   (7 citations)  (Correct)

No context found.

Benjamin Van Roy, Learning and value function approximation in complex decision processes, Ph.D. thesis, MIT, Cambridge, MA, 1998.


Solving Factored MDPs with Continuous and Discrete Variables - Guestrin, Hauskrecht.. (2004)   (Correct)

No context found.

B. Van Roy. Learning and value function approximation in complex decision problems. PhD thesis, MIT, 1998.


Policy Iteration for Factored MDPs - Koller, Parr (2000)   (17 citations)  (Correct)

No context found.

B. Van Roy. Learning and Value Function Approximation in Complex Decision Problems. PhD thesis, Massachusetts Institute of Technology, 1998.


A New Complexity Result on Solving the Markov Decision Problem - Ye (2003)   (Correct)

No context found.

B. Van Roy, Learning and value function approximation in complex decision processes, Ph.D. Thesis, MIT (1998).


Implementing Heterogeneous Agents in Dynamic Environments.. - Jafar Habibi Mazda   (Correct)

No context found.

Van Roy, B., Learning and Value Function Approximation in Complex Decision Processes, Ph.D. Thesis, Massachusetts Institute of Technology, May 1998. 30 This score is computed based on how much the agents have reduced the city damage. J. Habibi et al.


Title of the Book! - Name Of Author   (Correct)

No context found.

B. Van Roy, Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 1998.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC