Results 1  10
of
27
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 92 (11 self)
 Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Value function approximation with diffusion wavelets and Laplacian eigenfunctions
 In Advances in Neural Information Processing Systems 18
, 2006
"... We investigate the problem of automatically constructing efficient representations or basis functions for approximating value functions based on analyzing the structure and topology of the state space. In particular, two novel approaches to value function approximation are explored based on automa ..."
Abstract

Cited by 39 (14 self)
 Add to MetaCart
(Show Context)
We investigate the problem of automatically constructing efficient representations or basis functions for approximating value functions based on analyzing the structure and topology of the state space. In particular, two novel approaches to value function approximation are explored based on automatically constructing basis functions on state spaces that can be represented as graphs or manifolds: one approach uses the eigenfunctions of the Laplacian, in effect performing a global Fourier analysis on the graph; the second approach is based on diffusion wavelets, which generalize classical wavelets to graphs using multiscale dilations induced by powers of a diffusion operator or random walk on the graph. Together, these approaches form the foundation of a new generation of methods for solving large Markov decision processes, in which the underlying representation and policies are simultaneously learned. 1
Fast direct policy evaluation using multiscale analysis of markov diffusion processes
 In Proceedings of the 23rd International Conference on Machine Learning, 601–608
, 2005
"... Policy evaluation is a critical step in the approximate solution of large Markov decision processes (MDPs), typically requiring O(S  3) to directly solve the Bellman system of S  linear equations (where S  is the state space size in the discrete case, and the sample size in the continuous case ..."
Abstract

Cited by 24 (15 self)
 Add to MetaCart
(Show Context)
Policy evaluation is a critical step in the approximate solution of large Markov decision processes (MDPs), typically requiring O(S  3) to directly solve the Bellman system of S  linear equations (where S  is the state space size in the discrete case, and the sample size in the continuous case). In this paper we apply a recently introduced multiscale framework for analysis on graphs to design a faster algorithm for policy evaluation. For a fixed policy π, this framework efficiently constructs a multiscale decomposition of the random walk P π associated with the policy π. This enables efficiently computing medium and long term state distributions, approximation of value functions, and the direct computation of the potential operator (I −γP π) −1 needed to solve Bellman’s equation. We show that even a preliminary nonoptimized version of the solver competes with highly optimized iterative techniques, requiring in many cases a complexity of O(S). 1.
Samuel meets Amarel: Automating Value Function Approximation using Global State Space Analysis
, 2005
"... Most work on value function approximation adheres to Samuel’s original design: agents learn a taskspecific value function using parameter estimation, where the approximation architecture (e.g, polynomials) is specified by a human designer. This paper proposes a novel framework generalizing Samuel’s ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Most work on value function approximation adheres to Samuel’s original design: agents learn a taskspecific value function using parameter estimation, where the approximation architecture (e.g, polynomials) is specified by a human designer. This paper proposes a novel framework generalizing Samuel’s paradigm using a coordinatefree approach to value function approximation. Agents learn both representations and value functions by constructing geometrically customized taskindependent basis functions that form an orthonormal set for the Hilbert space of smooth functions on the underlying state space manifold. The approach rests on a technical result showing that the space of smooth functions on a (compact) Riemannian manifold has a discrete spectrum associated with the LaplaceBeltrami operator. In the discrete setting, spectral analysis of the graph Laplacian yields a set of geometrically customized basis functions for approximating and decomposing value functions. The proposed framework generalizes Samuel’s value function approximation paradigm by combining it with a formalization of Saul Amarel’s paradigm of representation learning through global state space analysis.
Constructing basis functions from directed graphs for value function approximation
 In Proceedings of the International Conference on Machine Learning (ICML
, 2007
"... Basis functions derived from an undirected graph connecting nearby samples from a Markov decision process (MDP) have proven useful for approximating value functions. The success of this technique is attributed to the smoothness of the basis functions with respect to the state space geometry. This pa ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
(Show Context)
Basis functions derived from an undirected graph connecting nearby samples from a Markov decision process (MDP) have proven useful for approximating value functions. The success of this technique is attributed to the smoothness of the basis functions with respect to the state space geometry. This paper explores the properties of bases created from directed graphs which are a more natural fit for expressing state connectivity. Digraphs capture the effect of nonreversible MDPs whose value functions may not be smooth across adjacent states. We provide an analysis using the Dirichlet sum of the directed graph Laplacian to show how the smoothness of the basis functions is affected by the graph’s invariant distribution. Experiments in discrete and continuous MDPs with nonreversible actions demonstrate a significant improvement in the policies learned using directed graph bases. 1.
LSTD with random projections
 In Advances in Neural Information Processing Systems
, 2010
"... We consider the problem of reinforcement learning in highdimensional spaces when the number of features is bigger than the number of samples. In particular, we study the leastsquares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection f ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
We consider the problem of reinforcement learning in highdimensional spaces when the number of features is bigger than the number of samples. In particular, we study the leastsquares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a highdimensional space. We provide a thorough theoretical analysis of the LSTD with random projections and derive performance bounds for the resulting algorithm. We also show how the error of LSTD with random projections is propagated through the iterations of a policy iteration algorithm and provide a performance bound for the resulting leastsquares policy iteration (LSPI) algorithm. 1
Hybrid leastsquares algorithms for approximate policy evaluation
 Machine Learning
"... evaluation ..."
(Show Context)
Basis Expansion in Natural Actor Critic Methods
"... Abstract. In reinforcement learning, the aim of the agent is to find a policy that maximizes its expected return. Policy gradient methods try to accomplish this goal by directly approximating the policy using a parametric function approximator; the expected return of the current policy is estimated ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Abstract. In reinforcement learning, the aim of the agent is to find a policy that maximizes its expected return. Policy gradient methods try to accomplish this goal by directly approximating the policy using a parametric function approximator; the expected return of the current policy is estimated and its parameters are updated by steepest ascent in the direction of the gradient of the expected return with respect to the policy parameters. In general, the policy is defined in terms of a set of basis functions that capture important features of the problem. Since the quality of the resulting policies directly depend on the set of basis functions, and defining them gets harder as the complexity of the problem increases, it is important to be able to find them automatically. In this paper, we propose a new approach which uses cascadecorrelation learning architecture for automatically constructing a set of basis functions within the context of Natural ActorCritic (NAC) algorithms. Such basis functions allow more complex policies be represented, and consequently improve the performance of the resulting policies. We also present the effectiveness of the method empirically. 1
Incremental basis function expansion in reinforcement learning using cascadecorrelation networks
 in "Proc. ECAI workshop
, 2008
"... v er sio n 2 ..."