Results 1 
5 of
5
BASIS CONSTRUCTION AND UTILIZATION FOR MARKOV DECISION PROCESSES USING GRAPHS
, 2010
"... The ease or difficulty in solving a problem strongly depends on the way it is represented.
For example, consider the task of multiplying the numbers 12 and 24. Now imagine multiplying
XII and XXIV. Both tasks can be solved, but it is clearly more difficult to use
the Roman numeral representations of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
The ease or difficulty in solving a problem strongly depends on the way it is represented.
For example, consider the task of multiplying the numbers 12 and 24. Now imagine multiplying
XII and XXIV. Both tasks can be solved, but it is clearly more difficult to use
the Roman numeral representations of twelve and twentyfour. Humans excel at finding
appropriate representations for solving complex problems. This is not true for artificial
systems, which have largely relied on humans to provide appropriate representations. The
ability to autonomously construct useful representations and to efficiently exploit them is
an important challenge for artificial intelligence.
This dissertation builds on a recently introduced graphbased approach to learning representations
for sequential decisionmaking problems modeled as Markov decision processes
(MDPs). Representations, or basis functions, forMDPs are abstractions of the problem’s
state space and are used to approximate value functions, which quantify the expected
longterm utility obtained by following a policy. The graphbased approach generates basis
functions capturing the structure of the environment. Handling large environments requires
efficiently constructing and utilizing these functions. We address two issues with
this approach: (1) scaling basis construction and value function approximation to large
graphs/data sets, and (2) tailoring the approximation to a specific policy’s value function.
We introduce two algorithms for computing basis functions from large graphs. Both
algorithms work by decomposing the basis construction problem into smaller, more manageable
subproblems. One method determines the subproblems by enforcing block structure,
or groupings of states. The other method uses recursion to solve subproblems which
are then used for approximating the original problem. Both algorithms result in a set of basis
functions from which we employ basis selection algorithms. The selection algorithms
represent the value function with as few basis functions as possible, thereby reducing the
computational complexity of value function approximation and preventing overfitting.
The use of basis selection algorithms not only addresses the scaling problem but also
allows for tailoring the approximation to a specific policy. This results in a more accurate
representation than obtained when using the same subset of basis functions irrespective of
the policy being evaluated. To make effective use of the data, we develop a hybrid leastsquares
algorithm for setting basis function coefficients. This algorithm is a parametric
combination of two common leastsquares methods used for MDPs. We provide a geometric
and analytical interpretation of these methods and demonstrate the hybrid algorithm’s
ability to discover improved policies. We also show how the algorithm can include graphbased
regularization to help with sparse samples from stochastic environments.
This work investigates all aspects of linear value function approximation: constructing
a dictionary of basis functions, selecting a subset of basis functions from the dictionary,
and setting the coefficients on the selected basis functions. We empirically evaluate each
of these contributions in isolation and in one combined architecture.
An Algorithmic Survey of Parametric Value Function Approximation
"... Reinforcement learning is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the socalled value function. A recurrent subtopic of reinforce ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Reinforcement learning is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the socalled value function. A recurrent subtopic of reinforcement learning is to compute an approximation of this value function when the system is too large for an exact representation. This survey reviews stateoftheart methods for (parametric) value function approximation by grouping them into three main categories: bootstrapping, residual and projected fixedpoint approaches. Related algorithms are derived by considering one of the associated cost functions and a specific minimization method, generally a stochastic gradient descent or a recursive leastsquares approach.
Addendum
"... In [1], we proposed two hybrid leastsquares algorithms for approximate policy evaluation. These algorithms, which were referred to as H1 and H2, combine the optimization criterion of two leastsquares algorithms: the Bellman residual method (which minimizes the Bellman residual) and the fixed point ..."
Abstract
 Add to MetaCart
(Show Context)
In [1], we proposed two hybrid leastsquares algorithms for approximate policy evaluation. These algorithms, which were referred to as H1 and H2, combine the optimization criterion of two leastsquares algorithms: the Bellman residual method (which minimizes the Bellman residual) and the fixed point method (which minimizes the projection of the Bellman residual). Algorithm H1 was well motivated, but we proposed H2 in a more adhoc manner. Here we show that H2 actually has a much more principled foundation. The derivation of H2 is inspired by Kolter and Ng’s recent paper [2]. Derivation of Hybrid Algorithm H2 We use the following terminology: P π is a transition matrix encoding the effects of policy π, Rπ is the reward function, γ ∈ [0, 1) is a discount factor, T π (x) ≡ Rπ + γP πx is the Bellman operator, ρ is a distribution over states, Φ is a basis function matrix, and β ∈ [0, 1] is a parameter trading off between the Bellman residual method (β = 1) and the fixed point method (β = 0). An approximate value function ˆ V = Φw linearly combines the basis functions in Φ with an adjustable vector of coefficients, w. For a complete description of the terminology, please see the full paper [1]. Following [2], we introduce the function f(w): f(w) = argmin u
Compressive Reinforcement Learning with Oblique Random Projections
, 2011
"... Compressive sensing has been rapidly growing as a nonadaptive dimensionality reduction framework, wherein highdimensional data is projected onto a randomly generated subspace. In this paper we explore a paradigm called compressive reinforcement learning, where approximately optimal policies are co ..."
Abstract
 Add to MetaCart
(Show Context)
Compressive sensing has been rapidly growing as a nonadaptive dimensionality reduction framework, wherein highdimensional data is projected onto a randomly generated subspace. In this paper we explore a paradigm called compressive reinforcement learning, where approximately optimal policies are computed in a lowdimensional subspace generated from a highdimensional feature space through random projections. We use the framework of oblique projections that unifies two popular methods to approximately solve MDPs – fixed point (FP) and Bellman residual (BR) methods, and derive error bounds on the quality of approximations obtained from combining random projections and oblique projections on a finite set of samples. We investigate the effectiveness of fixed point, Bellman residual, as well as hybrid leastsquares methods in feature spaces generated by random projections. Finally, we present simulation results in various continuous MDPs, which show both gains in computation time and effectiveness in problems with large feature spaces and small sample sets. 1