Results 1 
6 of
6
Sample Efficient Reinforcement Learning with Gaussian Processes
"... This paper derives sample complexity results for using Gaussian Processes (GPs) in both modelbased and modelfree reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a modelbased RL approach using GPs, GPRmax, is sample efficient (PACMDP). However, ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
This paper derives sample complexity results for using Gaussian Processes (GPs) in both modelbased and modelfree reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a modelbased RL approach using GPs, GPRmax, is sample efficient (PACMDP). However, we then show that previous approaches to modelfree RL using GPs take an exponential number of steps to find an optimal policy, and are therefore not sample efficient. The third and main contribution is the introduction of a modelfree RL algorithm using GPs, DGPQ, which is sample efficient and, in contrast to modelbased algorithms, capable of acting in real time, as demonstrated on a fivedimensional aircraft simulator.
Computationally efficient Gaussian process changepoint detection and regression
, 2014
"... Most existing GP regression algorithms assume a single generative model, leading to poor performance when data are nonstationary, i.e. generated from multiple switching processes. Existing methods for GP regression over nonstationary data include clustering and changepoint detection algorithms. Ho ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Most existing GP regression algorithms assume a single generative model, leading to poor performance when data are nonstationary, i.e. generated from multiple switching processes. Existing methods for GP regression over nonstationary data include clustering and changepoint detection algorithms. However, these methods require significant computation, do not come with provable guarantees on correctness and speed, and most algorithms only work in batch settings. This thesis presents an efficient online GP framework, GPNBC, that leverages the generalized likelihood ratio test to detect changepoints and learn multiple Gaussian Process models from streaming data. Furthermore, GPNBC can quickly recognize and reuse previously seen models. The algorithm is shown to be theoretically sample efficient in terms of limiting mistaken predictions. Our empirical results on two realworld datasets and one synthetic dataset show GPNBC outperforms state of the art methods for nonstationary regression in terms of regression error and computational efficiency. The second part of the thesis introduces a Reinforcement Learning (RL) algorithm, UCRLGPCPD, for multitask Reinforcement Learning when the reward function is non
Sample Complexity and Performance Bounds for Nonparametric Approximate Linear Programming
"... One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in no ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in nonparametric approximate linear programming (NPALP), have demonstrated that this can be done effectively using nothing more than a smoothness assumption on the value function. In this paper we extend these results to the case where samples come from real world transitions instead of the full Bellman equation, adding robustness to noise. In addition, we provide the first maxnorm, finite sample performance guarantees for any form of ALP. NPALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution. 1 Introduction and
Optimistic planning for continuous–action deterministic systems.
 In 2013 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL13),
, 2013
"... Abstract : We consider the optimal control of systems with deterministic dynamics, continuous, possibly largescale state spaces, and continuous, lowdimensional action spaces. We describe an online planning algorithm called SOOP, which like other algorithms in its class has no direct dependence on ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract : We consider the optimal control of systems with deterministic dynamics, continuous, possibly largescale state spaces, and continuous, lowdimensional action spaces. We describe an online planning algorithm called SOOP, which like other algorithms in its class has no direct dependence on the state space structure. Unlike previous algorithms, SOOP explores the true solution space, consisting of infinite sequences of continuous actions, without requiring knowledge about the smoothness of the system. To this end, it borrows the principle of the simultaneous optimistic optimization method, and develops a nontrivial adaptation of this principle to the planning problem. Experiments on four problems show SOOP reliably ranks among the best algorithms, fully dominating competing methods when the problem requires both long horizons and fine discretization.
and Regression
, 2014
"... Most existing GP regression algorithms assume a single generative model, leading to poor performance when data are nonstationary, i.e. generated from multiple switching processes. Existing methods for GP regression over nonstationary data include clustering and changepoint detection algorithms. Ho ..."
Abstract
 Add to MetaCart
(Show Context)
Most existing GP regression algorithms assume a single generative model, leading to poor performance when data are nonstationary, i.e. generated from multiple switching processes. Existing methods for GP regression over nonstationary data include clustering and changepoint detection algorithms. However, these methods require significant computation, do not come with provable guarantees on correctness and speed, and most algorithms only work in batch settings. This thesis presents an efficient online GP framework, GPNBC, that leverages the generalized likelihood ratio test to detect changepoints and learn multiple Gaussian Process models from streaming data. Furthermore, GPNBC can quickly recognize and reuse previously seen models. The algorithm is shown to be theoretically sample efficient in terms of limiting mistaken predictions. Our empirical results on two realworld datasets and one synthetic dataset show GPNBC outperforms state of the art methods for nonstationary regression in terms of regression error and computational efficiency.
δ1
"... Lemma 1 Consider a GP trained on samples ~y = [y1,..., yt] which are drawn from p(y  x) at input locations X = [x1,..., xt], with E[y  x] = f(x) and Vm = ymax − ymin. If the predictive variance of the GP at xi ∈ X is σ2(xi) ≤ σ2tol = ..."
Abstract
 Add to MetaCart
(Show Context)
Lemma 1 Consider a GP trained on samples ~y = [y1,..., yt] which are drawn from p(y  x) at input locations X = [x1,..., xt], with E[y  x] = f(x) and Vm = ymax − ymin. If the predictive variance of the GP at xi ∈ X is σ2(xi) ≤ σ2tol =