Results 1 
4 of
4
Distributed dialogue policies for multidomain statistical dialogue management
 In ICASSP
, 2015
"... Statistical dialogue systems offer the potential to reduce costs by learning policies automatically online, but are not designed to scale to large opendomains. This paper proposes a hierarchical distributed dialogue architecture in which policies are organised in a class hierarchy aligned to an un ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
(Show Context)
Statistical dialogue systems offer the potential to reduce costs by learning policies automatically online, but are not designed to scale to large opendomains. This paper proposes a hierarchical distributed dialogue architecture in which policies are organised in a class hierarchy aligned to an underlying knowledge graph. This allows a system to be deployed using a modest amount of data to train a small set of generic policies. As further data is collected, generic policies can be adapted to give indomain performance. Using Gaussian processbased reinforcement learning, it is shown that within this framework generic policies can be constructed which provide acceptable user performance, and better performance than can be obtained using undertrained domain specific policies. It is also shown that as sufficient indomain data becomes available, it is possible to seamlessly improve performance, without subjecting users to unacceptable behaviour during the adaptation period and without limiting the final performance compared to policies trained from scratch.1 Index Terms — opendomain, multidomain, dialogue systems, POMDP, Gaussian process
Computationally efficient Gaussian process changepoint detection and regression
, 2014
"... Most existing GP regression algorithms assume a single generative model, leading to poor performance when data are nonstationary, i.e. generated from multiple switching processes. Existing methods for GP regression over nonstationary data include clustering and changepoint detection algorithms. Ho ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Most existing GP regression algorithms assume a single generative model, leading to poor performance when data are nonstationary, i.e. generated from multiple switching processes. Existing methods for GP regression over nonstationary data include clustering and changepoint detection algorithms. However, these methods require significant computation, do not come with provable guarantees on correctness and speed, and most algorithms only work in batch settings. This thesis presents an efficient online GP framework, GPNBC, that leverages the generalized likelihood ratio test to detect changepoints and learn multiple Gaussian Process models from streaming data. Furthermore, GPNBC can quickly recognize and reuse previously seen models. The algorithm is shown to be theoretically sample efficient in terms of limiting mistaken predictions. Our empirical results on two realworld datasets and one synthetic dataset show GPNBC outperforms state of the art methods for nonstationary regression in terms of regression error and computational efficiency. The second part of the thesis introduces a Reinforcement Learning (RL) algorithm, UCRLGPCPD, for multitask Reinforcement Learning when the reward function is non
and Regression
, 2014
"... Most existing GP regression algorithms assume a single generative model, leading to poor performance when data are nonstationary, i.e. generated from multiple switching processes. Existing methods for GP regression over nonstationary data include clustering and changepoint detection algorithms. Ho ..."
Abstract
 Add to MetaCart
(Show Context)
Most existing GP regression algorithms assume a single generative model, leading to poor performance when data are nonstationary, i.e. generated from multiple switching processes. Existing methods for GP regression over nonstationary data include clustering and changepoint detection algorithms. However, these methods require significant computation, do not come with provable guarantees on correctness and speed, and most algorithms only work in batch settings. This thesis presents an efficient online GP framework, GPNBC, that leverages the generalized likelihood ratio test to detect changepoints and learn multiple Gaussian Process models from streaming data. Furthermore, GPNBC can quickly recognize and reuse previously seen models. The algorithm is shown to be theoretically sample efficient in terms of limiting mistaken predictions. Our empirical results on two realworld datasets and one synthetic dataset show GPNBC outperforms state of the art methods for nonstationary regression in terms of regression error and computational efficiency.
RealWorld Reinforcement Learning via MultiFidelity Simulators
"... Abstract—Reinforcement learning (RL) can be a tool for designing policies and controllers for robotic systems. However, the cost of realworld samples remains prohibitive as many RL algorithms require a large number of samples before learning useful policies. Simulators are one way to decrease the n ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Reinforcement learning (RL) can be a tool for designing policies and controllers for robotic systems. However, the cost of realworld samples remains prohibitive as many RL algorithms require a large number of samples before learning useful policies. Simulators are one way to decrease the number of required realworld samples, but imperfect models make deciding when and how to trust samples from a simulator difficult. We present a framework for efficient RL in a scenario where multiple simulators of a target task are available, each with varying levels of fidelity. The framework is designed to limit the number of samples used in each successively higherfidelity/cost simulator by allowing a learning agent to choose to run trajectories at the lowest level simulator that will still provide it with useful information. Theoretical proofs of the framework’s sample complexity are given and empirical results are demonstrated on a remote controlled car with multiple simulators. The approach enables RL algorithms to find nearoptimal policies in a physical robot domain with fewer expensive realworld samples than previous transfer approaches or learning without simulators.