Results

**1 - 4**of**4**### Learning Task-Specific State Representations by Maximizing Slowness and Predictability

"... Abstract. The success of reinforcement learning in robotic tasks is highly dependent on the state representation – a mapping from high dimensional sensory observations of the robot to states that can be used for reinforcement learning. Even though many methods have been proposed to learn state repre ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract. The success of reinforcement learning in robotic tasks is highly dependent on the state representation – a mapping from high dimensional sensory observations of the robot to states that can be used for reinforcement learning. Even though many methods have been proposed to learn state representations, it remains an important open problem. Identifying the characteristics existing methods are optimizing to find good state representations, combining them, and adding new characteristics will lead to a more robust method for state representation learning. We define a new characteristic – predictability – and combine it with slowness. We implement these characteristics in a neural network and show that this approach can find good state representations from visual input in simulated robotic tasks. 1

### Explore to See, Learn to Perceive, Get the Actions for Free: SKILLABILITY

, 2014

"... How can a humanoid robot autonomously learn and refine multiple sensorimotor skills as a byproduct of curiosity driven exploration, upon its high-dimensional unprocessed visual input? We present SKILLABILITY, which makes this possible. It combines the recently introduced Curiosity Driven Modular In ..."

Abstract
- Add to MetaCart

(Show Context)
How can a humanoid robot autonomously learn and refine multiple sensorimotor skills as a byproduct of curiosity driven exploration, upon its high-dimensional unprocessed visual input? We present SKILLABILITY, which makes this possible. It combines the recently introduced Curiosity Driven Modular Incremental Slow Feature Analysis (Curious Dr. MISFA) with the well-known options framework. Curious Dr. MISFA’s objective is to acquire abstractions as quickly as possible. These abstractions map high-dimensional pixel-level vision to a low-dimensional manifold. We find that each learnable abstraction augments the robot’s state space (a set of poses) with new information about the environment, for example, when the robot is grasping a cup. The abstraction is a function on an image, called a slow feature, which can effectively discretize a high-dimensional visual sequence. For example, it maps the sequence of the robot watching its arm as it moves around, grasping randomly, then grasping a cup, and moving around some more while holding the cup, into a step function having two outputs: when the cup is or is not currently grasped. The new state space includes this grasped/not grasped information. Each abstraction is coupled with an option. The reward function for the option’s policy (learned through Least Squares Policy Iteration) is high for transitions that produce a large change in the step-function-like slow features. This corresponds to finding bottleneck states, which are known good subgoals for hierarchical reinforcement learning- in the example, the subgoal corresponds to grasping the cup. The final skill includes both the learned policy and the learned abstraction. SKILLABILITY makes our iCub the first humanoid robot to learn complex skills such as to topple or grasp an object, from raw high-dimensional video input, driven purely by its intrinsic motivations.

### Construction of Approximation Spaces for Reinforcement Learning

, 2013

"... Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximatio ..."

Abstract
- Add to MetaCart

Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modification to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance of approximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for