Results 1 -
3 of
3
Linear Complementarity for Regularized Policy Evaluation and Improvement
, 2010
"... Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over th ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efficient off-theshelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modified policy iteration that can be used to approximate a “greedy” homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization.
Representation Discovery in Sequential Decision Making
"... Automatically constructing novel representations of tasks from analysis of state spaces is a longstanding fundamental challenge in AI. I review recent progress on this problem for sequential decision making tasks modeled as Markov decision processes. Specifically, I discuss three classes of represen ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Automatically constructing novel representations of tasks from analysis of state spaces is a longstanding fundamental challenge in AI. I review recent progress on this problem for sequential decision making tasks modeled as Markov decision processes. Specifically, I discuss three classes of representation discovery problems: finding functional, state, and temporal abstractions. I describe solution techniques varying along several dimensions: diagonalization or dilation methods using approximate or exact transition models; rewardspecific vs reward-invariant methods; global vs. local representation construction methods; multiscale vs. flat discovery methods; and finally, orthogonal vs. redundant representation discovery methods. I conclude by describing a number of open problems for future work.
BASIS CONSTRUCTION AND UTILIZATION FOR MARKOV DECISION PROCESSES USING GRAPHS
, 2010
"... The ease or difficulty in solving a problem strongly depends on the way it is represented.
For example, consider the task of multiplying the numbers 12 and 24. Now imagine multiplying
XII and XXIV. Both tasks can be solved, but it is clearly more difficult to use
the Roman numeral representations of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The ease or difficulty in solving a problem strongly depends on the way it is represented.
For example, consider the task of multiplying the numbers 12 and 24. Now imagine multiplying
XII and XXIV. Both tasks can be solved, but it is clearly more difficult to use
the Roman numeral representations of twelve and twenty-four. Humans excel at finding
appropriate representations for solving complex problems. This is not true for artificial
systems, which have largely relied on humans to provide appropriate representations. The
ability to autonomously construct useful representations and to efficiently exploit them is
an important challenge for artificial intelligence.
This dissertation builds on a recently introduced graph-based approach to learning representations
for sequential decision-making problems modeled as Markov decision processes
(MDPs). Representations, or basis functions, forMDPs are abstractions of the problem’s
state space and are used to approximate value functions, which quantify the expected
long-term utility obtained by following a policy. The graph-based approach generates basis
functions capturing the structure of the environment. Handling large environments requires
efficiently constructing and utilizing these functions. We address two issues with
this approach: (1) scaling basis construction and value function approximation to large
graphs/data sets, and (2) tailoring the approximation to a specific policy’s value function.
We introduce two algorithms for computing basis functions from large graphs. Both
algorithms work by decomposing the basis construction problem into smaller, more manageable
subproblems. One method determines the subproblems by enforcing block structure,
or groupings of states. The other method uses recursion to solve subproblems which
are then used for approximating the original problem. Both algorithms result in a set of basis
functions from which we employ basis selection algorithms. The selection algorithms
represent the value function with as few basis functions as possible, thereby reducing the
computational complexity of value function approximation and preventing overfitting.
The use of basis selection algorithms not only addresses the scaling problem but also
allows for tailoring the approximation to a specific policy. This results in a more accurate
representation than obtained when using the same subset of basis functions irrespective of
the policy being evaluated. To make effective use of the data, we develop a hybrid leastsquares
algorithm for setting basis function coefficients. This algorithm is a parametric
combination of two common least-squares methods used for MDPs. We provide a geometric
and analytical interpretation of these methods and demonstrate the hybrid algorithm’s
ability to discover improved policies. We also show how the algorithm can include graphbased
regularization to help with sparse samples from stochastic environments.
This work investigates all aspects of linear value function approximation: constructing
a dictionary of basis functions, selecting a subset of basis functions from the dictionary,
and setting the coefficients on the selected basis functions. We empirically evaluate each
of these contributions in isolation and in one combined architecture.

