Results 1 -
8 of
8
Distributed dialogue policies for multi-domain statistical dialogue management
- In ICASSP
, 2015
"... Statistical dialogue systems offer the potential to reduce costs by learning policies automatically on-line, but are not designed to scale to large open-domains. This paper proposes a hierarchical distributed dialogue architecture in which policies are organised in a class hierarchy aligned to an un ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
(Show Context)
Statistical dialogue systems offer the potential to reduce costs by learning policies automatically on-line, but are not designed to scale to large open-domains. This paper proposes a hierarchical distributed dialogue architecture in which policies are organised in a class hierarchy aligned to an underlying knowledge graph. This allows a system to be deployed using a modest amount of data to train a small set of generic policies. As further data is collected, generic policies can be adapted to give in-domain performance. Us-ing Gaussian process-based reinforcement learning, it is shown that within this framework generic policies can be constructed which provide acceptable user performance, and better performance than can be obtained using under-trained domain specific policies. It is also shown that as sufficient in-domain data becomes available, it is possible to seamlessly improve performance, without subjecting users to unacceptable behaviour during the adaptation period and without limiting the final performance compared to policies trained from scratch.1 Index Terms — open-domain, multi-domain, dialogue systems, POMDP, Gaussian process
Learning from real users: Rating dialogue success with neural networks for reinforcement learning in spoken dialogue systems
- in Proceedings of Interspeech
, 2015
"... To train a statistical spoken dialogue system (SDS) it is essen-tial that an accurate method for measuring task success is avail-able. To date training has relied on presenting a task to either simulated or paid users and inferring the dialogue’s success by observing whether this presented task was ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
To train a statistical spoken dialogue system (SDS) it is essen-tial that an accurate method for measuring task success is avail-able. To date training has relied on presenting a task to either simulated or paid users and inferring the dialogue’s success by observing whether this presented task was achieved or not. Our aim however is to be able to learn from real users acting under their own volition, in which case it is non-trivial to rate the suc-cess as any prior knowledge of the task is simply unavailable. User feedback may be utilised but has been found to be incon-sistent. Hence, here we present two neural network models that evaluate a sequence of turn-level features to rate the success of a dialogue. Importantly these models make no use of any prior knowledge of the user’s task. The models are trained on dia-logues generated by a simulated user and the best model is then used to train a policy on-line which is shown to perform at least as well as a baseline system using prior knowledge of the user’s task. We note that the models should also be of interest for eval-uating SDS and for monitoring a dialogue in rule-based SDS. Index Terms: spoken dialogue systems, real users, reward pre-diction, dialogue success classification, neural network
extended
"... on-line adaptation of POMDP-based dialogue managers to ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
on-line adaptation of POMDP-based dialogue managers to
Learning Domain-Independent Dialogue Policies via Ontology
"... This paper introduces a novel approach to eliminate the domain dependence of dialogue state and action representations, such that dialogue policies trained based on proposed representations can be trans-ferred across different domains. The ex-perimental results show that the policy op-timised in a r ..."
Abstract
- Add to MetaCart
This paper introduces a novel approach to eliminate the domain dependence of dialogue state and action representations, such that dialogue policies trained based on proposed representations can be trans-ferred across different domains. The ex-perimental results show that the policy op-timised in a restaurant search domain us-ing our domain-independent representa-tions can be deployed to a laptop sale do-main, achieving a performance very close to that of the policy optimised directly us-
Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems
"... Statistical spoken dialogue systems have the attractive property of being able to be optimised from data via interactions with real users. However in the rein-forcement learning paradigm the dialogue manager (agent) often requires significant time to explore the state-action space to learn to behave ..."
Abstract
- Add to MetaCart
(Show Context)
Statistical spoken dialogue systems have the attractive property of being able to be optimised from data via interactions with real users. However in the rein-forcement learning paradigm the dialogue manager (agent) often requires significant time to explore the state-action space to learn to behave in a desirable manner. This is a critical issue when the system is trained on-line with real users where learn-ing costs are expensive. Reward shaping is one promising technique for addressing these concerns. Here we examine three re-current neural network (RNN) approaches for providing reward shaping information in addition to the primary (task-orientated) environmental feedback. These RNNs are trained on returns from dialogues gener-ated by a simulated user and attempt to diffuse the overall evaluation of the dia-logue back down to the turn level to guide the agent towards good behaviour faster. In both simulated and real user scenarios these RNNs are shown to increase policy learning speed. Importantly, they do not require prior knowledge of the user’s goal. 1
MULTI-DOMAIN DIALOGUE SUCCESS CLASSIFIERS FOR POLICY TRAINING
"... We propose a method for constructing dialogue success classifiers that are capable of making accurate predictions in domains unseen during training. Pooling and adaptation are also investigated for constructing multi-domain models when data is available in the new domain. This is achieved by reformu ..."
Abstract
- Add to MetaCart
(Show Context)
We propose a method for constructing dialogue success classifiers that are capable of making accurate predictions in domains unseen during training. Pooling and adaptation are also investigated for constructing multi-domain models when data is available in the new domain. This is achieved by reformulating the features input to the recurrent neural net-work models introduced in [1]. Importantly, on our task of main interest, this enables policy training in a new domain without the dialogue success classifier (which forms the re-inforcement learning reward function) ever having seen data from that domain before. This occurs whilst incurring only a small reduction in performance relative to developing and using an in-domain dialogue success classifier. Finally, given the motivation with these dialogue success classifiers is to en-able policy training with real users, we demonstrate that these initial policy training results obtained with a simulated user carry over to learning from paid human users. Index Terms — statistical spoken dialogue systems, dia-logue success, multi-domain, policy training 1.
POLICY COMMITTEE FOR ADAPTATION IN MULTI-DOMAIN SPOKEN DIALOGUE SYSTEMS
"... Moving from limited-domain dialogue systems to open do-main dialogue systems raises a number of challenges. One of them is the ability of the system to utilise small amounts of data from disparate domains to build a dialogue manager policy. Previous work has focused on using data from dif-ferent dom ..."
Abstract
- Add to MetaCart
(Show Context)
Moving from limited-domain dialogue systems to open do-main dialogue systems raises a number of challenges. One of them is the ability of the system to utilise small amounts of data from disparate domains to build a dialogue manager policy. Previous work has focused on using data from dif-ferent domains to adapt a generic policy to work for a spe-cific domain. Inspired by Bayesian Committee Machines, this paper proposes the use of a committee of dialogue poli-cies. The results show that such a model is particularly ben-eficial for adaptation in multi-domain dialogue systems and significantly improves performance compared to a single pol-icy baseline. Index Terms — Bayesian committee machines, Gaussian processes, reinforcement learning
THE USE OF DISCRIMINATIVE BELIEF TRACKING IN POMDP-BASED DIALOGUE SYSTEMS
"... Statistical spoken dialogue systems based on Partially Ob-servable Markov Decision Processes (POMDPs) have been shown to be more robust to speech recognition errors by main-taining a belief distribution over multiple dialogue states and making policy decisions based on the entire distribution rather ..."
Abstract
- Add to MetaCart
(Show Context)
Statistical spoken dialogue systems based on Partially Ob-servable Markov Decision Processes (POMDPs) have been shown to be more robust to speech recognition errors by main-taining a belief distribution over multiple dialogue states and making policy decisions based on the entire distribution rather than the single most likely hypothesis. To date most POMDP-based systems have used generative trackers. However, con-cerns about modelling accuracy have created interest in dis-criminative methods, and recent results from the second Dia-log State Tracking Challenge (DSTC2) have shown that dis-criminative trackers can significantly outperform generative models in terms of tracking accuracy. The aim of this pa-per is to investigate the extent to which these improvements translate into improved task completion rates when incorpo-rated into a spoken dialogue system. To do this, the Recur-rent Neural Network (RNN) tracker described by Henderson et al in DSTC2 was integrated into the Cambridge statistical dialogue system and compared with the existing generative Bayesian network tracker. Using a Gaussian Process (GP) based policy, the experimental results indicate that the system using the RNN tracker performs significantly better than the system with the original Bayesian network tracker. Index Terms — dialogue management, spoken dialogue systems, recurrent neural networks, belief tracking, POMDP 1.