22 citations found. Retrieving documents...
Geoffrey Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, 1999.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Effective Reinforcement Learning for Mobile Robots - Smart, Kaelbling (2002)   (6 citations)  (Correct)

....shown that this will not work for general function approximators, even in seemingly benign situations [4] In previous work [5] 6] we presented an algorithm, Hedger, that addresses the problems associated with value function approximation. The algorithm is based on the observation by Gordon [7] that a function approximator can safely be used to replace the tabular value function representation if it never extrapolates from its training data. He showed that locally weighted averaging (LWA) is such a function approximator. Hedger uses a more powerful function approximator, locally ....

Geo#rey J. Gordon, Approximate Solutions to Markov Decision Processes, Ph.D. thesis, School of Computer Science, Carnegie Mellon University, June 1999, Also available as technical report CMU-CS-99-143.


Kernel-Based Reinforcement Learning in Average-Cost Problems: .. - Ormoneit, Glynn (2000)   (2 citations)  (Correct)

....in a continuous space framework. Specifically, the recently advocated direct policy search or perturbation methods can by construction at most be optimal in a local sense [SMSM00, VRK00] Relevant earlier work on local averaging in the context of reinforcement learning includes [Rus97] and [Gor99]. While these papers pursue related ideas, their approaches differ fundamentally from ours in the assumption that the transition probabilities of the MDP are known and can be used for learning. By contrast, kernelbased reinforcement learning only relies on sample trajectories of the MDP and it is ....

G. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Computer Science Department, Carnegie Mellon University, 1999.


Improving The Performance Of Q-Learning With Locally Weighted.. - Aljibury (2001)   (Correct)

....massage. 6.2 Future Directions The next step to be taken with this method is to apply the function approximation for control during learning. Though convergence has not yet been proven, the possibility exists that good results could be obtained experimentally on this class of problem. Gordon [40] has done some investigation of online fitted approximations, but his results, while encouraging, show that large classes of approximators, including LWR, are demonstratively divergent under some situations. This is somewhat discouraging, as LWR seems to be the most fitting type of approximator ....

G. J. Gordon, Approximate Solutions to Markov Decision Processes. Ph.D. Thesis, Carnegie Mellon University, 1999.


Relative Expected Instantaneous Loss Bounds - Forster, Warmuth   (8 citations)  (Correct)

....to prove relative expected instantaneous loss bounds. Our work builds on the recent successes in proving relative total loss bounds for on line algorithms. These bounds hold for worst case sequences and they grow as O(ln T ) See Foster [7] Vovk [22] Azoury and Warmuth [3] Forster [5] Gordon [9], Yamanishi [24] 25] There are standard conversions of on line algorithms to o line algorithms (See Helmbold and Warmuth [13] Kivinen and Warmuth [14] These conversions would produce complicated algorithms and their relative expected instantaneous loss bounds would have the form O( ln T ....

Gordon, G. J. (1999). Approximate Solutions to Markov Decision Processes. Ph. D. thesis, Department of Computer Science, Carnegie Mellon University, Pittsburgh. Technical report CMU-CS-99-143.


Off-Policy Temporal-Difference Learning with Function.. - Precup, Sutton, Dasgupta (2001)   (4 citations)  (Correct)

....independent feature vectors for which an exact solution exists, yet for which the approximate values found by Q learning diverge to in nity. This problem prompted the development of residual gradient methods (Baird, 1995) which are stable but much slower than Q learning, and tted value iteration (Gordon, 1995, 1999), which is also stable but limited to restricted, weaker than linear function approximators. Of course, Q learning has been used with linear function approximation since its invention (Watkins, 1989) often with good results, but the soundness of this approach is no longer an open question. There ....

Gordon, G. J. (1999). Approximate Solutions to Markov Decision Processes . Doctoral Thesis. Dept.


Variable Resolution Discretization in Optimal Control - Munos, Moore (2001)   (8 citations)  (Correct)

....of the in uence is thus cheap: equivalent to computing the value function of a discounted Markov chain. Remark. As pointed out by Geo rey Gordon, the in uence is closely related to the dual variables (or shadow prices in economics) of the Linear Program equivalent to the Bellman equation (Gordon, 1999). This property has already been used in (Trick Zin, 1993) to derive an ecient adaptive grid generation. Remark. A possible extension is to de ne the in uence of a MDP as the in nitesimal change in the value function of a state resulting from an in nitesimal modi cation of the reward at ....

Gordon, G. J. (1999). Approximate solutions to Markov Decision Processes. Ph.D. thesis, CS department, Carnegie Mellon University, Pittsburgh, PA.


Kernel-Based Reinforcement Learning - Ormoneit, Sen (1999)   (7 citations)  (Correct)

....This bias is larger than in a comparable regression problem. We also provide an asymptotic formula for the bias increase which could help understand this issue in a more general framework. In the context of reinforcement learning, local averaging has been suggested in work by Rust [23] and Gordon [11], making the assumption that the transition probabilities of the MDP are known and can be used for learning. Our approach is fundamentally di erent in that kernel based reinforcement learning only relies on the sample trajectories of the MDP. Therefore it is more widely applicable. Other related ....

G. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 1999.


Off-Policy Temporal-Difference Learning with Function.. - Precup, Sutton (2001)   (4 citations)  (Correct)

....feature vectors, for which an exact solution exists, yet for which the approximate values found by Q learning diverge to in nity. This problem prompted the development of residual gradient methods (Baird, 1995) which are stable but much slower than Q learning, and tted value iteration (Gordon, 1995, 1999), which is also stable but limited to restricted, weaker than linear function approximators. Of course, Q learning has been used with linear function approximation since its invention (Watkins, 1989) often with good results, but the soundness of this approach is no longer an open question. There ....

Gordon, G. J. (1999). Approximate Solutions to Markov Decision Processes . Doctoral Thesis. Dept.


The Linear Programming Approach to Approximate Dynamic.. - de Farias, Van Roy (2001)   (18 citations)  (Correct)

....in the number of basis functions employed, independently of the number of states or the number of state variables. It is also worth noting that although our bounds were developed under the assumption that all constraints are satisfied, this might not be necessary. Indeed, as pointed out in [11], when using the approximate LP method, one might actually benefit from allowing some of the constraints to be violated; in particular, note that if the constraints for a given state x are violated for a given vector J , that means (TJ) x) J(x) in which case J is possibly assigning a high ....

Gordon, G., Approximate Solutions to Markov Decision Processess, Ph.D. Thesis, Carneggie Mellon University, 1999.


Relative Expected Instantaneous Loss Bounds - Forster, Warmuth (2000)   (8 citations)  (Correct)

....(See, e.g. Anthony and Bartlett [1] Instead we want to build on the recent successes in proving relative total loss bounds for online algorithms. These bounds hold for worst case sequences and they grow as O(ln T ) See Foster [5] Vovk [14] Azoury and Warmuth [2] Forster [3] Gordon [6], Yamanishi [16] 17] There are standard conversions of on line algorithms to off line algorithms (See Helmbold and Warmuth [9] and Kivinen and Warmuth [10] These conversions would produce complicated algorithms and their relative expected instantaneous loss bounds would have the form O( ln ....

Gordon, G. J. (1999). Approximate Solutions to Markov Decision Processes. Ph. D. thesis, Department of Computer Science, Carnegie Mellon University, Pittsburgh. Technical report CMU-CS-99-143.


Relative Loss Bounds for On-line Density Estimation with the.. - Azoury, Warmuth (2000)   (23 citations)  (Correct)

....An important insight we gained from this research is that O(log T ) relative loss bounds seem to require the use of variable learning rates. In this paper, the learning rate applied in trial t is O(1=t) The use of O(1=t) learning rates for the exponential family was also suggested by Gordon [Gor99] as a possible strategy for leading to better 4 bounds. However, no speci c examples were worked out. In the case of linear regression, the O(1=t) learning rates become inverses of the covariance matrix of the past examples. General frameworks of on line learning algorithms were developed in ....

....for leading to better 4 bounds. However, no speci c examples were worked out. In the case of linear regression, the O(1=t) learning rates become inverses of the covariance matrix of the past examples. General frameworks of on line learning algorithms were developed in [GLS97, KW97, KW98, Gor99] We follow the philosophy of Kivinen and Warmuth [KW97] of starting with a divergence function. From the divergence function we derive the on line update and then use the same divergence as a potential in the amortized analysis. A similar method was developed in [GLS97] for the case when the ....

[Article contains additional citation context not shown here]

Geo rey J. Gordon. Approximate solutions to Markov decision processes. Ph. D. thesis, Department of Computer Science, Carnegie Mellon University, Pittsburgh. Technical report CMU-CS-99-143, June 1999.


The Minimax Strategy for Gaussian Density Estimation - Takimoto, Warmuth (2000)   (Correct)

....with t = ax0 P t 1 q=1 xq a t 1 . Here a 0 is the multiplicity of the initial instance. The initial instance is chosen to be zero for Gaussian density estimation. This prediction algorithm is the forward algorithm of [1] The same algorithm was investigated in parallel work by Gordon [4]. The forward algorithm was inspired by a similar related algorithm of Vovk for linear regression [11] We show that the regret of the forward algorithm is larger than 1 2 X 2 (ln T O(1) regardless of the choice of a. This holds even if the constant a is allowed to depend on the horizon T . ....

....the forward algorithm By Lemma 3 the optimal shrinkage factor c t is roughly 1= t ln T ln t) A good approximation to c t might be to use shrinkage factors of the form 1= t 1 a) for some universal constant a 0. This learner is called the forward algorithm. The constant a parameterizes a prior [1, 4]. In particular, Azoury and Warmuth [1] showed that the forward algorithm with a = 1 has the worst case regret of 1 2 X 2 (1 ln T ) More precisely the forward algorithm is the Bayes optimal algorithm that minimizes the expected regret under the following probabilistic setup: The adversary ....

G. J. Gordon. Approximate solutions to Markov decision processes. Ph. D. thesis, Department of Computer Science, Carnegie Mellon University, Pittsburgh. Technical report CMU-CS-99-143, June 1999.


Influence and Variance of a Markov Chain: Application to.. - Munos, Moore (1999)   (1 citation)  (Correct)

.... convergence of the iterated I n (x i jx) to the unique solution (the fixed point) I(x i jx) of (4) Remark 1 As pointed out by Geoffrey Gordon, the influence is closely related to the dual variables (or shadow prices in economics) of the Linear Program equivalent to the Bellman equation (see [2]) This property has already been used in [11] to derive an efficient adaptive grid generation. Remark 2 A possible extension is to define the influence of a MDP as the infinitesimal change in the value function of a state resulting from an infinitesimal modification of the reward at another ....

Geoffrey J. Gordon. Approximate solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, 1999.


Generalized Linear Models - Gordon   Self-citation (Rey Gordon)   (Correct)

....a matching pair of link and loss functions. The loss function which corresponds to F is DF (z j y) i [F (z i ) y i z i F (y i ) 2) where F (y) is de ned so that min z DF (z j y) 0. F is the convex dual of F [8] and DF is the generalized Bregman divergence from z to y [9]. Expression (2) is nonnegative, and it is globally convex in all of the z i s (and therefore also in since each z i is a linear function of ) If we write f for the gradient of F , the derivative of (2) with respect to z i is f(z i ) y i . So, 2) will be zero if and only if y i = f(z i ) ....

....so that UV reconstructs X with the smallest possible sum of squared errors. 5 Algorithms for tting (GL) We could solve equations (4 5) with any of several di erent algorithms. For example, we could use gradient descent on either U; V or A; B. Or, we could use the generalized gradient descent [9] update rule (with learning rate ) A (X f(UV ) V B U (X f(UV ) The advantage of these algorithms is that they are simple to implement and don t require additional assumptions on F , G, and H . They can even work when F , G, and H are nondi erentiable by using subgradients. In this ....

Geo rey J. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, 1999.


Regret Bounds for Prediction Problems - Gordon (1999)   (2 citations)  Self-citation (Gordon)   (Correct)

....descent algorithms, and Sections 8 and 9 cover generalized linear regression algorithms including linear regression and exponentiated gradient. Another interesting problem is inference of the natural parameter in an exponential family. We will not have space to cover this problem here, but see [Gor99] for more detail. 4 Weighted Majority One of the simplest MAP algorithms is Weighted Majority, described in [LW92] For a more detailed analysis of WM in our framework, see [Gor99] here we will only give a brief review. The legal predictions for WM are the vectors in the unit simplex P = ....

....of the natural parameter in an exponential family. We will not have space to cover this problem here, but see [Gor99] for more detail. 4 Weighted Majority One of the simplest MAP algorithms is Weighted Majority, described in [LW92] For a more detailed analysis of WM in our framework, see [Gor99] here we will only give a brief review. The legal predictions for WM are the vectors in the unit simplex P = fwjw 0 w = 1g. The ith component of w t is interpreted as the weight given to the ith expert on step t. The loss functions for t 1 are l t (w) x t Delta w, where x t;i is the ....

Geoffrey J. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, 1999.


Regret Bounds for Prediction Problems - Gordon (1999)   (2 citations)  Self-citation (Gordon)   (Correct)

....descent algorithms, and Sections 8 and 9 cover generalized linear regression algorithms including linear regression and exponentiated gradient. Another interesting problem is inference of the natural parameter in an exponential family. We will not have space to cover this problem here, but see [Gor99] for more detail. 4 Weighted Majority One of the simplest MAP algorithms is Weighted Majority, described in [LW92] For a more detailed analysis of WM in our framework, see [Gor99] here we will only give a brief review. The legal predictions for WM are the vectors in the unit simplex P = ....

....of the natural parameter in an exponential family. We will not have space to cover this problem here, but see [Gor99] for more detail. 4 Weighted Majority One of the simplest MAP algorithms is Weighted Majority, described in [LW92] For a more detailed analysis of WM in our framework, see [Gor99] here we will only give a brief review. The legal predictions for WM are the vectors in the unit simplex P = fwjw 0 P w = 1g. The ith component of w t is interpreted as the weight given to the ith expert on step t. The loss functions for t 1 are l t (w) x t Delta w, where x t;i is the ....

Geoffrey J. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, 1999.


Planning In Hybrid Structured Stochastic - Domains Comenius University   (Correct)

No context found.

Geoffrey Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, 1999.


Heuristic Refinements of Approximate Linear Programming for .. - Kveton, Hauskrecht (2004)   (Correct)

No context found.

Gordon, G. 1999. Approximate Solutions to Markov Decision Processes. Ph.D. Dissertation, Carnegie Mellon University.


A Convergent Form of Approximate Policy - Iteration Theodore Perkins   (Correct)

No context found.

G. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, 1999.


Explicit Manifold Representations for Value-Function - Approximation In Reinforcement   (Correct)

No context found.

Geoffrey J. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, School of Computer Science, Carnegie Mellon University, June 1999. Also available as technical report CMUCS -99-143.


Influence and Variance of a Markov Chain: Application to.. - Munos, Moore (1999)   (1 citation)  (Correct)

No context found.

Geoffrey J. Gordon. Approximate solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, 1999.


Relative Loss Bounds for On-line Density Estimation with the.. - Azoury, Warmuth (2000)   (23 citations)  (Correct)

No context found.

Geo rey J. Gordon. Approximate solutions to Markov decision processes. Ph. D. thesis, Department of Computer Science, Carnegie Mellon University, Pittsburgh. Technical report CMU-CS-99-143, June 1999.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC