#### DMCA

## Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems

### Cached

### Download Links

Citations: | 77 - 33 self |

### Citations

5462 | Reinforcement Learning: An Introduction
- Sutton, R, et al.
- 1998
(Show Context)
Citation Context ...itic is one of several algorithms that could be used to learn the policy. Other popular learning algorithms include TD-learning, SARSAlearning and various improvements of point-based value iteration (=-=Sutton and Barto, 1998-=-; Shani et al., 2008; Williams and Young, 2006a). One particularly important characteristic of the NAC algorithm is that it is model-free rather than model-based. This means that the algorithm learns ... |

1756 | Factor graphs and the sum-product algorithm
- Kschischang, Frey, et al.
- 2001
(Show Context)
Citation Context ...nt state, an efficient algorithm is required to update the beliefs. When performing calculations, many algorithms are better expressed using a different graphical representation called a factor graph(=-=Kschischang et al., 2001-=-). An example factor graph is shown in Figure 4 below. Factor graphs are undirected bipartite graphs, with two types of node. One type represents random variables (drawn as a circle), while the other ... |

1076 | Planning and acting in partially observable stochastic domains
- Kaelbling, Littman, et al.
- 1998
(Show Context)
Citation Context ...ely related to the original POMDP. The internal state space of this MDP is identical to the belief state of the POMDP and policies will optimise the MDP only if they also optimise the original POMDP (=-=Kaelbling et al., 1998-=-). Throughout this paper, optimisation is performed on this belief state MDP instead of the original POMDP. This allows the techniques to be generalised for use in both MDP and POMDP frameworks, or ev... |

755 | Dynamic Bayesian Networks: Representation, Inference and Learning - Murphy - 2002 |

468 | Generalized belief propagation - Yedidia, Weiss - 2000 |

429 | Natural gradient works efficiently in learning
- Amari
- 1998
(Show Context)
Citation Context ...act G −1 θ The optimal metric tensor to use in a statistical model is typically the Fisher Information Matrix which has been shown to give distances that are invariant to the scale of the parameters (=-=Amari, 1998-=-). Given a probability distribution p(x|θ), the Fisher Information is the matrix Gθ such that (Gθ)ij = ∂ log p(x|θ) ∂ log p(x|θ) E( ). Peters et al. (2005) shows that the Fisher Information ∂θi ∂θj ma... |

427 | Policy gradient methods for reinforcement learning with function approximation - Sutton, McAllester, et al. - 2000 |

358 | A family of algorithms for approximate bayesian inference
- Minka
- 2001
(Show Context)
Citation Context ...mplex graphs, the fixed points of LBP may give slight variations from the true marginal distribution. One can show that LBP is a special case of expectation propagation applied to discrete variables (=-=Minka, 2001-=-) and as such, fixed points of the algorithm minimise a set of local KL divergences. Yedidia et al. (2001) and Heskes (2003) have shown that the stable fixed points also represent minima of the Bethe ... |

298 | Tractable inference in complex stochastic processes
- Boyen, Koller
- 1998
(Show Context)
Citation Context ...used in storing them can be released except for the messages that are passed to the current time step. Similar approximations have been suggested in the Boyen-Koller and Factored Frontier algorithms (=-=Boyen and Koller, 1998-=-; Murphy, 2002) and experiments indicate that this two-time slice approximation is sufficient. Loopy belief propagation may require several iterations before convergence. Indeed, there is no guarantee... |

208 | Partially Observable Markov Decision Processes for Spoken Dialog Systems
- Williams, Young
- 2007
(Show Context)
Citation Context ... to combine the use of statistical policy learning and statistical models of uncertainty. The resulting framework is called the Partially Observable Markov Decision Process (POMDP) (Roy et al., 2000; =-=Williams and Young, 2006b-=-; Bui et al., 2007). Figure 1 shows 2how this framework compares to those suggested previously. The attention of this paper is focussed on the lower half of this figure. FSA Automatic policy optimisa... |

205 | A stochastic model of humanmachine interaction for learning dialog strategies - Levin, Pieraccini, et al. - 2000 |

108 | Spoken Dialogue Management Using Probabilistic Reasoning
- Roy, Pineau, et al.
(Show Context)
Citation Context ...n several attempts to combine the use of statistical policy learning and statistical models of uncertainty. The resulting framework is called the Partially Observable Markov Decision Process (POMDP) (=-=Roy et al., 2000-=-; Williams and Young, 2006b; Bui et al., 2007). Figure 1 shows 2how this framework compares to those suggested previously. The attention of this paper is focussed on the lower half of this figure. FS... |

89 | Natural actor-critic
- Peters, Vijayakumar, et al.
- 2005
(Show Context)
Citation Context ...expected future reward. There are various possible approaches to performing this optimisation but one that has worked well in the framework presented here is the Natural Actor Critic (NAC) algorithm (=-=Peters et al., 2005-=-), which performs policy optimisation using a modified form of gradient descent. Traditional gradient descent iteratively adds a multiple of the gradient to the parameters being estimated. This is not... |

85 | A Computational Architecture for Conversation
- Horvitz, Paek
- 1999
(Show Context)
Citation Context .... Unless significant assumptions and approximations are taken, the result is intractable for any real-world system. Many authors suggest using Bayesian network algorithms as a solution (Pulman, 1996; =-=Horvitz and Paek, 1999-=-; Meng et al., 2003; Bui et al., 2007). Young et al. (2007) take a different approach, grouping the state space into partitions where states have similar characteristics and pruning unlikely cases. An... |

79 | An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email - Walker - 2000 |

71 | Stable fixed points of loopy belief propagation are minima of the Bethe free energy - Heskes - 2002 |

57 | Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System - Schatzmann, Thomson, et al. - 2007 |

52 | A Framework for Unsupervised Learning of Dialogue Strategies, ser - Pietquin - 2004 |

46 | Factored Partially Observable Markov Decision Processes for Dialogue Management
- Williams, Poupart, et al.
(Show Context)
Citation Context ...useful factorisation is to separate the environment state into three components: st = (gt, ut, ht), where gt is the long term goal of the user, ut is the true user act and ht is the dialogue history (=-=Williams et al., 2005-=-) 3 . The observed user utterance is then conditionally dependent only on the true user act. In many systems, further structuring is possible by separating the state into concepts, c ∈ C. In a tourist... |

46 | An isu dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the talk in-car system - Lemon, Georgila, et al. - 2006 |

34 | The use of belief networks for Mixed-Initiative dialog modeling - Meng, Wai, et al. - 2003 |

32 | Scaling up POMDPs for dialog management: The “summary POMDP” method - Williams, Young - 2005 |

27 | 2007b. Scaling POMDPs for spoken dialog management
- Williams, Young
(Show Context)
Citation Context ...s attempts to build POMDP-based spoken dialogue systems have relied on mapping the state space into a much more compact summary space in order ensure tractable policy representation and optimisation (=-=Williams and Young, 2007-=-; Thomson et al., 2007). A further contribution of this paper is to show how a component-based policy can be defined over the full state space and optimised using the Natural Actor Critic (NAC) algori... |

21 | Conversational Games, Belief Revision and Bayesian Networks
- Pulman
- 1996
(Show Context)
Citation Context ...he environment. Unless significant assumptions and approximations are taken, the result is intractable for any real-world system. Many authors suggest using Bayesian network algorithms as a solution (=-=Pulman, 1996-=-; Horvitz and Paek, 1999; Meng et al., 2003; Bui et al., 2007). Young et al. (2007) take a different approach, grouping the state space into partitions where states have similar characteristics and pr... |

19 |
Efficient ADD operations for point-based algorithms
- Shani, Poupart, et al.
- 2008
(Show Context)
Citation Context ...lgorithms that could be used to learn the policy. Other popular learning algorithms include TD-learning, SARSAlearning and various improvements of point-based value iteration (Sutton and Barto, 1998; =-=Shani et al., 2008-=-; Williams and Young, 2006a). One particularly important characteristic of the NAC algorithm is that it is model-free rather than model-based. This means that the algorithm learns from sample dialogue... |

16 | Scaling POMDPs for dialog management with composite summary and point-based value iteration (CSPBVI
- Williams, Young
- 2006
(Show Context)
Citation Context ... be used to learn the policy. Other popular learning algorithms include TD-learning, SARSAlearning and various improvements of point-based value iteration (Sutton and Barto, 1998; Shani et al., 2008; =-=Williams and Young, 2006a-=-). One particularly important characteristic of the NAC algorithm is that it is model-free rather than model-based. This means that the algorithm learns from sample dialogues rather than using the mod... |

15 | Automatic Design of Spoken Dialogue Systems - Scheffler - 2002 |

14 | A tractable DDNPOMDP approach to affective dialogue modeling for general probabilistic frame-based dialogue systems
- Bui, Poel, et al.
- 2007
(Show Context)
Citation Context ...istical policy learning and statistical models of uncertainty. The resulting framework is called the Partially Observable Markov Decision Process (POMDP) (Roy et al., 2000; Williams and Young, 2006b; =-=Bui et al., 2007-=-). Figure 1 shows 2how this framework compares to those suggested previously. The attention of this paper is focussed on the lower half of this figure. FSA Automatic policy optimisation MDP Statistic... |

14 | Evaluating Semantic-level Confidence Scores with Multiple Hypotheses - Thomson, Yu, et al. - 2008 |

14 | Applying POMDPs to Dialog Systems in the Troubleshooting Domain
- Williams
- 2007
(Show Context)
Citation Context ...also include testing a user’s internet connectivity or checking that their password is correct. Observations can include responses from these tests or any other perceptual input from the environment (=-=Williams, 2007a-=-). 5time dependence is insignificant, the t is omitted and a prime symbol is used to denote the next time step (e.g. o ′ = ot+1). The definition of the belief state and its transitions is the key fea... |

14 | Using particle filters to track dialogue state
- Williams
- 2007
(Show Context)
Citation Context ...ithm as an extension of the HIS approach to systems with conditionally independent slots and changes in the user goal. An alternative approach to updating the belief state is to use particle filters (=-=Williams, 2007b-=-). Comparing this method with the approach here is difficult since the computation needed for a particle filter depends crucially on the number of particles used. For a given level of accuracy, this i... |

11 |
Training a real-world POMDP-based dialog system
- Thomson, Schatzmann, et al.
- 2007
(Show Context)
Citation Context ...based spoken dialogue systems have relied on mapping the state space into a much more compact summary space in order ensure tractable policy representation and optimisation (Williams and Young, 2007; =-=Thomson et al., 2007-=-). A further contribution of this paper is to show how a component-based policy can be defined over the full state space and optimised using the Natural Actor Critic (NAC) algorithm Peters et al. 3(2... |

4 |
Where do we go from here
- Pieraccini, Huerta
- 2005
(Show Context)
Citation Context ...nt in the effectiveness of these systems would have far reaching implications. Commercial dialogue systems are typically implemented by flowcharting system prompts along with possible user responses (=-=Pieraccini and Huerta, 2008-=-). The system is represented as a graph, sometimes called the call flow, where nodes represent prompts or actions to be taken by the system and the arcs give the possible responses. Formally, the syst... |

4 |
Statistical user modeling for dialogue systems
- Schatzmann
- 2008
(Show Context)
Citation Context ...inforcement learning techniques (labeled MDP). These MDP systems were built by another researcher in another project and considerable effort had been expended to make them as competitive as possible (=-=Schatzmann, 2008-=-). 11 The error model for training used a fixed confusion rate of 0.4 whereas the tests were conducted over a range of error rates. Thus, the test and training conditions were unmatched. 304.3 Simula... |

1 | An ISU dialogue 35 exhibiting reinforcement learning of dialogue policies: generic slotfilling in the TALK in-car system - Lemon, Georgila, et al. - 2006 |

1 | Computer Speech and Language 24 (2010) 562–588 587 - Thomson, Young - 1998 |

1 | An experimental evaluation using KL divergence gave the same trends - Thomson, S - 2008 |

1 | Conversational games, belief revision and bayesian networks - unknown authors - 1996 |

1 | An application of reinforcement learning to dailogue strategy selection in a spoken dialogue system for email - Walker - 2000 |