Results 1  10
of
23
Value function approximation in reinforcement learning using the Fourier basis
, 2008
"... We describe the Fourier Basis, a linear value function approximation scheme based on the Fourier Series. We empirically evaluate its properties, and demonstrate that it performs well compared to Radial Basis Functions and the Polynomial Basis, the two most popular fixed bases for linear value functi ..."
Abstract

Cited by 45 (14 self)
 Add to MetaCart
(Show Context)
We describe the Fourier Basis, a linear value function approximation scheme based on the Fourier Series. We empirically evaluate its properties, and demonstrate that it performs well compared to Radial Basis Functions and the Polynomial Basis, the two most popular fixed bases for linear value function approximation, and is competitive with learned ProtoValue Functions even though no extra experience or computation is required. 1
Robot Learning from Demonstration by Constructing Skill Trees
"... We describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
(Show Context)
We describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills to be improved efficiently using a policy learning algorithm. Chains from multiple demonstration trajectories are merged into a skill tree. We show that CST can be used to acquire skills from human demonstration in a dynamic continuous domain, and from both expert demonstration and learned control sequences on the uBot5 mobile manipulator. 1 1
M.: Finitesample analysis of LassoTD
 In: Proceedings of the International Conference on Machine Learning
, 2011
"... Abstract. In this paper, we analyze the performance of LassoTD, a modification of LSTD in which the projection operator is defined as a Lasso problem. We first show that LassoTD is guaranteed to have a unique fixed point and its algorithmic implementation coincides with the recently presented LARS ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we analyze the performance of LassoTD, a modification of LSTD in which the projection operator is defined as a Lasso problem. We first show that LassoTD is guaranteed to have a unique fixed point and its algorithmic implementation coincides with the recently presented LARSTD and LCTD methods. We then derive two bounds on the prediction error of LassoTD in the Markov design setting, i.e., when the performance is evaluated on the same states used by the method. The first bound makes no assumption, but has a slow rate w.r.t. the number of samples. The second bound is under an assumption on the empirical Gram matrix, called the compatibility condition, but has an improved rate and directly relates the prediction error to the sparsity of the value function in the feature space at hand. For the full
Greedy algorithms for sparse reinforcement learning
 In International Conference on Machine Learning
, 2012
"... Feature selection and regularization are becoming increasingly prominent tools in the efforts of the reinforcement learning (RL) community to expand the reach and applicability of RL. One approach to the problem of feature selection is to impose a sparsityinducing form of regularization on the lear ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Feature selection and regularization are becoming increasingly prominent tools in the efforts of the reinforcement learning (RL) community to expand the reach and applicability of RL. One approach to the problem of feature selection is to impose a sparsityinducing form of regularization on the learning method. Recent work on L1 regularization has adapted techniques from the supervised learning literature for use with RL. Another approach that has received renewed attention in the supervised learning community is that of using a simple algorithm that greedily adds new features. Such algorithms have many of the good properties of the L1 regularization methods, while also being extremely efficient and, in some cases, allowing theoretical guarantees on recovery of the true form of a sparse target function from sampled data. This paper considers variants of orthogonal matching pursuit (OMP) applied to reinforcement learning. The resulting algorithms are analyzed and compared experimentally with existing L1 regularized approaches. We demonstrate that perhaps the most natural scenario in which one might hope to achieve sparse recovery fails; however, one variant, OMPBRM, provides promising theoretical guarantees under certain assumptions on the feature dictionary. Another variant, OMPTD, empirically outperforms prior methods both in approximation accuracy and efficiency on several benchmark problems. 1.
Regularized OffPolicy TDLearning
, 2012
"... We present a novel l1 regularized offpolicy convergent TDlearning method (termed ROTD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying ROTD integrates two key ideas: offpolicy convergent gradient TD method ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
We present a novel l1 regularized offpolicy convergent TDlearning method (termed ROTD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying ROTD integrates two key ideas: offpolicy convergent gradient TD methods, such as TDC, and a convexconcave saddlepoint formulation of nonsmooth convex optimization, which enables firstorder solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of ROTD is presented. A variety of experiments are presented to illustrate the offpolicy convergence, sparse feature selection capability and low computational cost of the ROTD algorithm.
Bellman Error Based Feature Generation Using Random Projections
"... The accuracy of parametrized policy evaluation depends on the quality of the features used for estimating the value function. Hence, feature generation/selection in reinforcement learning (RL) has received a lot of attention (Di Castro and Mannor, 2010). We focus on methods that aim to generate feat ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
The accuracy of parametrized policy evaluation depends on the quality of the features used for estimating the value function. Hence, feature generation/selection in reinforcement learning (RL) has received a lot of attention (Di Castro and Mannor, 2010). We focus on methods that aim to generate features in the direction of the Bellman error of the current value estimates (Bellman Error
R.: Regularized least squares temporal difference learning with nested ℓ2 and ℓ1 penalization
 In: European Workshop on Reinforcement Learning (2011
"... novel scheme (L21) which uses ℓ2 Abstract. The construction of a suitable set of features to approximate value functions is a central problem in reinforcement learning (RL). A popular approach to this problem is to use highdimensional feature spaces together with leastsquares temporal difference l ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
novel scheme (L21) which uses ℓ2 Abstract. The construction of a suitable set of features to approximate value functions is a central problem in reinforcement learning (RL). A popular approach to this problem is to use highdimensional feature spaces together with leastsquares temporal difference learning (LSTD). Although this combination allows for very accurate approximations, it often exhibits poor prediction performance because of overfitting when the number of samples is small compared to the number of features in the approximation space. In the linear regression setting, regularization is commonly used to overcome this problem. In this paper, we review some regularized approaches to policy evaluation and we introduce a regularization in the projection operator and an ℓ1 penalty in the fixedpoint step. We show that such formulation reduces to a standard Lasso problem. As a result, any offtheshelf solver can be used to compute its solution and standardization techniques can be applied to the data. We report experimental results showing that L21 is effective in avoiding overfitting and that it compares favorably to existing ℓ1 regularized methods. 1
Value Function Approximation in Noisy Environments Using Locally Smoothed Regularized Approximate Linear Programs
"... Recently, Petrik et al. demonstrated that L1Regularized Approximate Linear Programming (RALP) could produce value functions and policies which compared favorably to established linear value function approximation techniques like LSPI. RALP’s success primarily stems from the ability to solve the fe ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Recently, Petrik et al. demonstrated that L1Regularized Approximate Linear Programming (RALP) could produce value functions and policies which compared favorably to established linear value function approximation techniques like LSPI. RALP’s success primarily stems from the ability to solve the feature selection and value function approximation steps simultaneously. RALP’s performance guarantees become looser if sampled next states are used. For very noisy domains, RALP requires an accurate model rather than samples, which can be unrealistic in some practical scenarios. In this paper, we demonstrate this weakness, and then introduce Locally Smoothed L1Regularized Approximate Linear Programming (LSRALP). We demonstrate that LSRALP mitigates inaccuracies stemming from noise even without an accurate model. We show that, given some smoothness assumptions, as the number of samples increases, error from noise approaches zero, and provide experimental examples of LSRALP’s success on common reinforcement learning benchmark problems. 1
An Algorithmic Survey of Parametric Value Function Approximation
"... Reinforcement learning is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the socalled value function. A recurrent subtopic of reinforce ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Reinforcement learning is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the socalled value function. A recurrent subtopic of reinforcement learning is to compute an approximation of this value function when the system is too large for an exact representation. This survey reviews stateoftheart methods for (parametric) value function approximation by grouping them into three main categories: bootstrapping, residual and projected fixedpoint approaches. Related algorithms are derived by considering one of the associated cost functions and a specific minimization method, generally a stochastic gradient descent or a recursive leastsquares approach.