#### DMCA

## Variational learning for switching state-space models (1998)

### Cached

### Download Links

- [www.gatsby.ucl.ac.uk]
- [mlg.eng.cam.ac.uk]
- [www.cs.utoronto.ca]
- [www.cs.toronto.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Neural Computation |

Citations: | 173 - 5 self |

### Citations

12415 |
Elements of information theory
- Cover, Thomas
(Show Context)
Citation Context ...t g (10) X fS t g Z Q(fS t ; X t g) log P (fS t ; X t ; Y t gj) Q(fS t ; X t g) dfX t g = B(Q; ); (11) where denotes the parameters of the model and we have made use of Jensen's inequality (Cover =-=and Thomas, 1991-=-) to establish (11). Both steps of EM increase the lower bound on the log probability of the observed data. The E-step holds the parameterssxed and sets Q to be the posterior distribution over the hid... |

11964 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...jung and Soderstrom, 1983). Similar gradient-based methods can be obtained for o- line methods. An alternative method for o-line learning makes use of the Expectation Maximization (EM) algorithm (Demp=-=ster et al., 1977-=-). This procedure iterates between an E-step thatsxes the current parameters and computes posterior probabilities over the hidden states given the observations, and an M-step that maximizes the expect... |

8903 |
Probabilistic reasoning in intelligent systems: networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ... Smyth, Heckerman and Jordan (1997), the forward{backward algorithm is a special case of exact inference algorithms for more general graphical probabilistic models (Lauritzen and Spiegelhalter, 1988; =-=Pearl, 1988-=-). The same observation holds true for the Kalman smoothing recursions. The other inference problem commonly posed for HMMs is to compute the single most likely sequence of hidden states. The solution... |

3865 | Time Series Analysis - Hamilton - 1994 |

1934 | A New Approach to Economic Analysis of Nonstationary Time Series
- Hamilton
- 1989
(Show Context)
Citation Context ...from one linear operating regime to another. There is a large literature on models of this kind in econometrics, signal processing, and otherselds (Harrison and Stevens, 1976; Chang and Athans, 1978; =-=Hamilton, 198-=-9; Shumway and Stoer, 1991; 1 Bar-Shalom and Li, 1993). Here we extend these models to allow for multiple real-valued state vectors, draw connections between theseselds and the relevant literature on ... |

1524 |
Local computations with probabilities on graphical structures and their applications to expert systems (with discussion),
- Lauritzen, Spiegelhalter
- 1988
(Show Context)
Citation Context ... (personal communication, 1985) and Smyth, Heckerman and Jordan (1997), the forward{backward algorithm is a special case of exact inference algorithms for more general graphical probabilistic models (=-=Lauritzen and Spiegelhalter, 1988-=-; Pearl, 1988). The same observation holds true for the Kalman smoothing recursions. The other inference problem commonly posed for HMMs is to compute the single most likely sequence of hidden states.... |

1133 |
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains
- Baum
- 1970
(Show Context)
Citation Context ...ich also consists of a forward and backward pass through the model. To learn maximum likelihood parameters for an HMM given sequences of observations, one can use the well-known Baum-Welch algorithm (=-=Baum et al., 1970-=-). This algorithm is a special case of EM that uses the forward{backward algorithm to infer the posterior probabilities of the hidden states in the E-step. The Mstep uses expected counts of transition... |

1132 | An introduction to hidden markov models.
- Rabiner, Juang
- 1986
(Show Context)
Citation Context ...l., 1994), and fault detection (Smyth, 1994). Given an HMM with known parameters and a sequence of observations, two algorithms are commonly used to solve two dierent forms of the inference problem (R=-=abiner and Juang, 1986-=-). Thesrst computes the posterior probabilities of the hidden states using a recursive algorithm known as the forward{backward algorithm. The computations in the forward pass are exactly analogous to ... |

1128 | An introduction to variational methods for graphical models - Jordan, Ghahramani, et al. - 1999 |

1083 |
Optimal Filtering
- Anderson, Moore
- 1979
(Show Context)
Citation Context ...ian and the priors for the hidden states are Gaussian, the resulting posterior is also Gaussian. Three special cases of the inference problem are often considered:sltering, smoothing, and prediction (=-=Anderson and Moore, 1979-=-; Goodwin and Sin, 1984). The goal ofsltering is to compute the probability of the current hidden state X t given the sequence of inputs and outputs up to time t|P (X t jfY g t 1 ; fUg t 1 ). 3 The re... |

993 | A view of the EM algorithm that justifies incremental, sparse, and other variants,” - Neal, Hinton - 1999 |

976 | Adaptive mixtures of local experts
- Jacobs, Jordan, et al.
- 1991
(Show Context)
Citation Context ... linear dynamical systems|and is closely related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network (=-=Jacobs et al., 1991-=-) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and t... |

883 | Hierarchical mixtures of experts and the EM algorithm
- Jordan, Jacobs
- 1994
(Show Context)
Citation Context ... large family of models. With regard to the literature on neural computation, the model presented in this paper is a generalization both of the mixture of experts neural network (Jacobs et al., 1991; =-=Jordan and Jacobs, 1994-=-) and the related mixture of factor analyzers (Hinton et al., 1996; Ghahramani and Hinton, 1996b). Previous dynamical generalizations of the mixture of experts architecture consider the case in which ... |

637 | M.I.J.: Factorial hidden markov models.
- Ghahramani
- 1997
(Show Context)
Citation Context ...ion probabilities for the HMM. 5 Note that the state vectors could be concatenated into one large state vector with factorized (block-diagonal) transition matrices (cf. factorial hidden Markov model; =-=Ghahramani and Jordan, 1997-=-). However, this obscures the decoupled structure of the model. 6 Both classes of methods can be seen as minimizing Kullback-Liebler (KL) divergences. However, the KL divergence is asymmetrical, and w... |

578 | Theory and practice of recursive identification - Ljung, Söderström - 1986 |

549 | A Model for Reasoning About Persistence and Causation, - Dean, Kanazawa - 1989 |

546 |
On Gibbs sampling for state space models
- Carter, Kohn
- 1994
(Show Context)
Citation Context ...eter estimation for this model, although without making reference to the EM algorithm. Other authors have used Markov chain Monte Carlo methods for state and parameter estimation in switching models (=-=Carter and Kohn, 1994-=-; Athaide, 1995) and in other related dynamic probabilistic networks (Dean and Kanazawa, 1989; Kanazawa et al., 1995). Hamilton (1989; 1994, section 22.4) describes a class of switching models in whic... |

363 | An approach to time series smoothing and forecasting using the EM algorithm, - Shumway, Stoffer - 1982 |

353 | Dynamic Linear Models with Markov-Switching - Kim - 1994 |

350 | A unifying review of linear Gaussian models,” - Roweis, Ghahramani - 1999 |

342 | Adaptive Filtering Prediction and Control - Goodwin, Sin - 1984 |

332 | Gershenfeld (Eds.), Time Series Prediction: Forecasting the Future and Understanding the Past - Weigend, N - 1994 |

298 |
Statistical Field Theory
- Parisi
(Show Context)
Citation Context ... parameters. A completely factorized approximation is often used in statistical physics, where it provides the basis for simple yet powerful meanseld approximations to statistical mechanical systems (=-=Parisi, 1988-=-). Theoretical arguments motivating approximate E-steps are presented in Neal and Hinton (1998; originally in a technical report in 1993). Saul and Jordan (1996) showed that approximate E-steps could ... |

278 | The em algorithm for mixtures of factor analyzers
- GHAHRAMANI, HINTON
- 1997
(Show Context)
Citation Context ...sented in this paper is a generalization both of the mixture of experts neural network (Jacobs et al., 1991; Jordan and Jacobs, 1994) and the related mixture of factor analyzers (Hinton et al., 1996; =-=Ghahramani and Hinton, 1996-=-b). Previous dynamical generalizations of the mixture of experts architecture consider the case in which the gating network has Markovian dynamics (Cacciatore and Nowlan, 1994; Kadirkamanathan and Kad... |

254 |
Multitarget-Multi-sensor Tracking:
- Bar-Shalom, Blair
- 2001
(Show Context)
Citation Context ... There is a large literature on models of this kind in econometrics, signal processing, and otherselds (Harrison and Stevens, 1976; Chang and Athans, 1978; Hamilton, 1989; Shumway and Stoer, 1991; 1 B=-=ar-Shalom and Li, 1993-=-). Here we extend these models to allow for multiple real-valued state vectors, draw connections between theseselds and the relevant literature on neural computation and probabilistic graphical models... |

194 | Parameter estimation for linear dynamical systems,”
- Ghahramani, Hinton
- 1996
(Show Context)
Citation Context ...sented in this paper is a generalization both of the mixture of experts neural network (Jacobs et al., 1991; Jordan and Jacobs, 1994) and the related mixture of factor analyzers (Hinton et al., 1996; =-=Ghahramani and Hinton, 1996-=-b). Previous dynamical generalizations of the mixture of experts architecture consider the case in which the gating network has Markovian dynamics (Cacciatore and Nowlan, 1994; Kadirkamanathan and Kad... |

193 | Probabilistic independence networks for hidden Markov probability models. - Smyth, Heckerman, et al. - 1996 |

190 |
Hidden Markov models of biological primary sequence information, in
- Baldi, Chauvin, et al.
- 1994
(Show Context)
Citation Context ...y dierent forms, such as a Gaussian, mixture of Gaussians, or a neural network. HMMs have been applied extensively to problems in speech recognition (Juang and Rabiner, 1991), computational biology (B=-=aldi et al., 199-=-4), and fault detection (Smyth, 1994). Given an HMM with known parameters and a sequence of observations, two algorithms are commonly used to solve two dierent forms of the inference problem (Rabiner ... |

176 | Stochastic simulation algorithms for dynamic probabilistic networks.
- Kanazawa, Koller, et al.
- 1995
(Show Context)
Citation Context ...v chain Monte Carlo methods for state and parameter estimation in switching models (Carter and Kohn, 1994; Athaide, 1995) and in other related dynamic probabilistic networks (Dean and Kanazawa, 1989; =-=Kanazawa et al., 1995-=-). Hamilton (1989; 1994, section 22.4) describes a class of switching models in which the real-valued observation at time t, Y t , depends both on the observations at times t 1 to t r and on the discr... |

175 | Modelling the manifolds of images of handwritten digits
- Hinton, Dayan, et al.
- 1997
(Show Context)
Citation Context ...tation, the model presented in this paper is a generalization both of the mixture of experts neural network (Jacobs et al., 1991; Jordan and Jacobs, 1994) and the related mixture of factor analyzers (=-=Hinton et al., 1996-=-; Ghahramani and Hinton, 1996b). Previous dynamical generalizations of the mixture of experts architecture consider the case in which the gating network has Markovian dynamics (Cacciatore and Nowlan, ... |

158 | Hidden Markov Models: Estimation and Control”, - Elliott - 1995 |

142 | Hidden Markov Models for Speech Recognition,”
- Juang, Rabiner
- 1991
(Show Context)
Citation Context ...ation vector, P (Y t jS t ) can be modeled in many dierent forms, such as a Gaussian, mixture of Gaussians, or a neural network. HMMs have been applied extensively to problems in speech recognition (J=-=uang and Rabiner, 199-=-1), computational biology (Baldi et al., 1994), and fault detection (Smyth, 1994). Given an HMM with known parameters and a sequence of observations, two algorithms are commonly used to solve two dier... |

126 | An input output HMM architecture.
- Bengio, Frasconi
- 1995
(Show Context)
Citation Context ...dels, HMMs can be augmented to allow for input variables, such that they model the conditional distribution of sequences of output observations given sequences of inputs (Cacciatore and Nowlan, 1994; =-=Bengio and Frasconi, 1995-=-; Meila and Jordan, 1996). 2.3 Hybrids A burgeoning literature on models which combine the discrete transition structure of HMMs with the linear dynamics of SSMs has developed inselds ranging from eco... |

117 | Exploiting tractable substructures in intractable networks. Neural Information Processing Systems. - Saul, Jordan - 1996 |

96 | ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition,
- Digalakis, Rohlicek, et al.
- 1993
(Show Context)
Citation Context ... E-step. For linear Gaussian state-space models, the E-step is exactly the Kalman smoothing problem as dened above, and the M-step simplies to a linear regression problem (Shumway and Stoer, 1982; Dig=-=alakis et al., 199-=-3). Details on the EM algorithm for state-space models can be found in Ghahramani and Hinton (1996b), as well as in the original Shumway and Stoer (1982) paper. 2.2 Hidden Markov models Hidden Markov ... |

88 | Dynamic linear models with switching, - Shumway, Stoffer - 1991 |

84 |
Bayesian forecasting (with discussion).
- Harrison, Stevens
- 1976
(Show Context)
Citation Context ...ch the dynamics can transition in a discrete manner from one linear operating regime to another. There is a large literature on models of this kind in econometrics, signal processing, and otherselds (=-=Harrison and Stevens, 197-=-6; Chang and Athans, 1978; Hamilton, 1989; Shumway and Stoer, 1991; 1 Bar-Shalom and Li, 1993). Here we extend these models to allow for multiple real-valued state vectors, draw connections between th... |

79 | A new view of the EM algorithm that justi incremental and other variants - Neal, Hinton - 1998 |

78 |
On state estimation in switching environments,”
- Ackerson, Fu
- 1970
(Show Context)
Citation Context ...ed neural network models. 4 Shortly after Kalman and Bucy solved the problem of state estimation for linear Gaussian state-space models attention turned to the analogous problem for switching models (=-=Ackerson and Fu, 1970-=-). Chang and Athans (1978) derive the equations for computing the conditional mean and variance of the state when the parameters of a linear state-space model switch according to arbitrary and Markovi... |

75 | Annealed competition of experts for a segmentation and classification of switching dynamics - Pawelzik, Kohlmorgen, et al. - 1996 |

63 | Mixtures of controllers for jump linear and non-linear plants,
- Cacciatore, Nowlan
- 1994
(Show Context)
Citation Context ...tributed). Like statespace models, HMMs can be augmented to allow for input variables, such that they model the conditional distribution of sequences of output observations given sequences of inputs (=-=Cacciatore and Nowlan, 1994-=-; Bengio and Frasconi, 1995; Meila and Jordan, 1996). 2.3 Hybrids A burgeoning literature on models which combine the discrete transition structure of HMMs with the linear dynamics of SSMs has develop... |

57 |
Solutions to the linear smoothing problem.
- Rauch, Missiles, et al.
- 1963
(Show Context)
Citation Context ...ard direction to compute the probability of X t given fY g t 1 and fUg t 1 . A similar set of backward recursions from T to t complete the computation by accounting for the observations after time t (=-=Rauch, 1963-=-). We will refer to the combined forward and backward recursions for smoothing as the Kalman smoothing recursions (also known as the RTS or Rauch-Tung-Streibel smoother). Finally, the goal of predicti... |

45 |
Hidden Markov Models for Fault Detection in Dynamic Systems,”
- Smyth
- 1994
(Show Context)
Citation Context ...ure of Gaussians, or a neural network. HMMs have been applied extensively to problems in speech recognition (Juang and Rabiner, 1991), computational biology (Baldi et al., 1994), and fault detection (=-=Smyth, 199-=-4). Given an HMM with known parameters and a sequence of observations, two algorithms are commonly used to solve two dierent forms of the inference problem (Rabiner and Juang, 1986). Thesrst computes ... |

40 |
Forecasting probability densities by using hidden Markov models
- Fraser, Dimitriadis
- 1994
(Show Context)
Citation Context ...dden Markov model driving an r th order auto-regressive process, and are tractable for small r and 5 number of discrete states in S. Hamilton's models are closely related to Hidden Filter HMM (HFHMM; =-=Fraser and Dimitriadis 1993-=-). HFHMMs have both discrete and real-valued states. However, the real-valued states are assumed to be either observed or a known, deterministic function of the past observations (i.e. an embedding). ... |

38 | New results in linear filtering and prediction - Kalman, Bucy - 1961 |

37 |
State estimation for discrete systems with switching parameters
- Chang, Athens
- 1978
(Show Context)
Citation Context ...on in a discrete manner from one linear operating regime to another. There is a large literature on models of this kind in econometrics, signal processing, and otherselds (Harrison and Stevens, 1976; =-=Chang and Athans, 197-=-8; Hamilton, 1989; Shumway and Stoer, 1991; 1 Bar-Shalom and Li, 1993). Here we extend these models to allow for multiple real-valued state vectors, draw connections between theseselds and the relevan... |

27 | Time series segmentation using predictive modular neural networks - Kehagias, Petrides - 1997 |

26 | A Mixture-of-Experts Framework for Adaptive Kalman Filtering,”
- Chaer, Bishop, et al.
- 1997
(Show Context)
Citation Context ... to control engineering, (Harrison and Stevens, 1976; Chang and Athans, 1978; Hamilton, 1989; Shumway and Stoer, 1991; Bar-Shalom and Li, 1993; Deng, 1993; Kadirkamanathan and Kadirkamanathan, 1996; C=-=haer et al., 1997-=-). These models are known alternately as hybrid models, state-space models with switching, and jump-linear systems. We brie y review some of this literature, including some related neural network mode... |

25 |
Deterministic annealing variant of the EM algorithm
- Ueda, Nakano
- 1995
(Show Context)
Citation Context ...ature parameter, which is initialized to a large value and gradually reduced to 1. The above equations maximize a modied form of the bound B in (11), where the entropy of Q has been multiplied by T (U=-=eda and Nakano, 1995-=-). 4.2 Merging Gaussians Almost all the approximate inference methods that are described in the literature for switching state-space models are based on the idea of merging, at each time step, a mixtu... |

20 | Learning fine motion by markov mixtures of experts - Meila, Jordan - 1996 |

16 |
A stochastic model of speech incorporating hierarchical nonstationarity.
- Deng
- 1993
(Show Context)
Citation Context ...of SSMs has developed inselds ranging from econometrics to control engineering, (Harrison and Stevens, 1976; Chang and Athans, 1978; Hamilton, 1989; Shumway and Stoer, 1991; Bar-Shalom and Li, 1993; D=-=eng, 1993-=-; Kadirkamanathan and Kadirkamanathan, 1996; Chaer et al., 1997). These models are known alternately as hybrid models, state-space models with switching, and jump-linear systems. We brie y review some... |

16 | T, Theory and Practice of Recursive Identi - Ljung, Soderstrom |

14 | On structured variational approximations.
- Ghahramani
- 1997
(Show Context)
Citation Context ...quence, the zeros of the derivatives of KL with respect to the variational parameters can be obtained simply by equating derivatives of hHi and hHQ i with respect to corresponding sucient statistics (=-=Ghahramani, 1997-=-): @hHQ Hi @hS (m) t i = 0 (43) @hHQ Hi @hX (m) t i = 0 (44) @hHQ Hi @hP (m) t i = 0 (45) where P (m) t = hX (m) t X (m) t 0 i hX (m) t ihX (m) t i 0 is the covariance of X (m) t under Q. Many terms c... |

14 | Multi-channel physiological data: Description and analysis - Rigney, Goldberger, et al. - 1993 |

10 |
Recursive estimation of dynamic modular RBF networks
- Kadirkamanathan, Kadirkamanathan
- 1996
(Show Context)
Citation Context ...developed inselds ranging from econometrics to control engineering, (Harrison and Stevens, 1976; Chang and Athans, 1978; Hamilton, 1989; Shumway and Stoer, 1991; Bar-Shalom and Li, 1993; Deng, 1993; K=-=adirkamanathan and Kadirkamanathan, 1996-=-; Chaer et al., 1997). These models are known alternately as hybrid models, state-space models with switching, and jump-linear systems. We brie y review some of this literature, including some related... |

8 |
A model for reasoning about persitence and causation
- Dean, Kanazawa
- 1989
(Show Context)
Citation Context ...r authors have used Markov chain Monte Carlo methods for state and parameter estimation in switching models (Carter and Kohn, 1994; Athaide, 1995) and in other related dynamic probabilistic networks (=-=Dean and Kanazawa, 1989-=-; Kanazawa et al., 1995). Hamilton (1989; 1994, section 22.4) describes a class of switching models in which the real-valued observation at time t, Y t , depends both on the observations at times t 1 ... |

6 |
Adaptiuefiltering prediction and control
- Goodwin, Kwai
- 1984
(Show Context)
Citation Context ... hidden states are Gaussian, the resulting posterior is also Gaussian. Three special cases of the inference problem are often considered:sltering, smoothing, and prediction (Anderson and Moore, 1979; =-=Goodwin and Sin, 1984-=-). The goal ofsltering is to compute the probability of the current hidden state X t given the sequence of inputs and outputs up to time t|P (X t jfY g t 1 ; fUg t 1 ). 3 The recursive algorithm used ... |

5 | Parameter estimation for linear dynamical systems (Tech - Ghahramani, Hinton - 1996 |

5 | The EM algorithm for mixtures of factor analyzers (Tech - Ghahramani, Hinton - 1996 |

4 |
Likelihood evaluation and state estimation for nonlinear state space models. Unpublished doctoral dissertation
- Athaide
- 1995
(Show Context)
Citation Context ...s model, although without making reference to the EM algorithm. Other authors have used Markov chain Monte Carlo methods for state and parameter estimation in switching models (Carter and Kohn, 1994; =-=Athaide, 1995-=-) and in other related dynamic probabilistic networks (Dean and Kanazawa, 1989; Kanazawa et al., 1995). Hamilton (1989; 1994, section 22.4) describes a class of switching models in which the real-valu... |

2 |
New results in linear and prediction
- Kalman, Bucy
- 1961
(Show Context)
Citation Context ...he current hidden state X t given the sequence of inputs and outputs up to time t|P (X t jfY g t 1 ; fUg t 1 ). 3 The recursive algorithm used to perform this computation is known as the Kalmanslter (=-=Kalman and Bucy, 1961-=-). The goal of smoothing is to compute the probability of X t given the sequence of inputs and outputs up to time T , where T > t. The Kalmanslter is used in the forward direction to compute the proba... |

1 |
Learning motion by Markov mixtures of experts
- Meila, Jordan
- 1996
(Show Context)
Citation Context ... to allow for input variables, such that they model the conditional distribution of sequences of output observations given sequences of inputs (Cacciatore and Nowlan, 1994; Bengio and Frasconi, 1995; =-=Meila and Jordan, 1996-=-). 2.3 Hybrids A burgeoning literature on models which combine the discrete transition structure of HMMs with the linear dynamics of SSMs has developed inselds ranging from econometrics to control eng... |

1 | On structured variational approximations (Tech - Ghahramani - 1997 |

1 | Switching State-Space Models 863 - Hamilton - 1989 |