#### DMCA

## Probabilistic independence networks for hidden Markov probability models (1996)

### Cached

### Download Links

Citations: | 193 - 13 self |

### Citations

11964 | Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird, et al. - 1977 |

8903 |
Probabilistic reasoning in intelligent systems: networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...contains the variables in the intersection of the two cliques which it links. Given a junction tree representation, one can factorize p(U) as the product of clique marginals over separator marginals (=-=Pearl 1988-=-): p(u) = Q C2VC p(xC) Q S2VS p(xS) where p(xC) andp(xS) are the marginal (joint) distributions for the variables in clique C and separator S respectively and VC and VS are the set of cliques and sepa... |

5884 | A tutorial of hidden Markov models and selected applications in speech recognition
- Rabiner
- 1989
(Show Context)
Citation Context ...special cases of inference algorithms for UPINs and can be considerably less e cient (Shachter et al. 1994). 4 Modeling HMMs as PINs 4.1 PINs for HMMs In hidden Markov modeling problems (Poritz 1988��=-=� Rabiner 1989) we are interes-=-ted in the set of random variables U = fH 1�O 1�H 2�O 2�:::�HN ;1�ON ;1�HN�ON g, whereHi is a discretevalued hidden variable at index i, and Oi is the corresponding discrete-valued obs... |

5116 | Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...ical speci cations of a particular probability model consistent with the UPIN structure. Terms used in the literature to described UPINs of one form or another include Markov random elds (Isham 1981, =-=Geman and Geman 1984-=-), Markov networks (Pearl 1988), Boltzmann machines (Hinton and Sejnowski 1986), and log-linear models (Bishop, Fienberg, & Holland 1973). 3.1.1 Conditional Independence Semantics of UPIN Structures L... |

4314 | Estimating the dimension of a model - Schwarz - 1978 |

1524 | Local computations with probabilities on graphical structures and their applications to expert systems (with discussion), - Lauritzen, Spiegelhalter - 1988 |

1316 |
Quantum Field Theory
- Itzykson, Zuber
- 1980
(Show Context)
Citation Context ...e statistical physics literature, where undirected graphical models in the form of chains, trees, lattices, and "decorated" variations on chains and trees have been studied for many years (s=-=ee, e.g., Itzykson and Drouff'e, 1991-=-). The general methods developed there, notably the transfer matrix formalism (e.g., Morgenstern and Binder, 1983), support exact calculations on general undirected graphs. The transfer matrix recursi... |

1157 | Learning Bayesian networks: The combination of knowledge and statistical data. - Heckerman, Geiger, et al. - 1995 |

1156 |
Optimal Statistical Decisions
- DeGroot
- 1970
(Show Context)
Citation Context ... Dirichlet distribution for the parameters of discrete variables and the mixing coe cients of Gaussian-mixture codebooks, and the normal-Wishart distribution for the parameters of Gaussian codebooks (=-=DeGroot 1970� -=-Buntine 1994� Heckerman and Geiger 1995). These priors have also been used in MAP estimates of standard HMMs (e.g., Gauvain and Lee, 1994). Heckerman and Geiger (1995) describe a simple method for a... |

1153 |
Introduction to Bayesian networks
- Jensen
- 1996
(Show Context)
Citation Context ..., adding links if necessary. If no node can be eliminated without adding links, then we choose the node that can be eliminated by adding the links that yield the clique with the smallest state-space (=-=Jensen 1995-=-). After triangulation the JLO algorithm constructs a junction tree from G 0 , i.e., a clique tree satisfying the running intersection property. The junction tree construction is based on the 11sX1 X ... |

835 | Discrete multivariate analysis: Theory and practice.
- Bishop, Fienberg, et al.
- 1975
(Show Context)
Citation Context ...nt probability distribution, i.e., a marginal representation. An algorithm known as Iterative Proportional Fitting (IPF) is available to perform this conversion. Classically, IPF proceeds as follows (=-=Bishop, Fienberg, & Holland, 1973-=-). Suppose for simplicity that all of the random variables are discrete (a Gaussian version of IPF is also available (Whittaker 1990)) such that the joint distribution can be represented as a table. T... |

736 | Probabilistic inference using Markov Chain Monte Carlo methods
- Neal
- 1993
(Show Context)
Citation Context ...aximum-a-posteriori (MAP), or full Bayesian methods, using traditional techniques such as gradient descent, expectation-maximization (EM) (e.g., Dempster et al., 1977), and MonteCarlo sampling (e.g., =-=Neal, 1993-=-). For the standard HMM(1,1) model discussed in this paper, where either discrete, Gaussian, or Gaussian-mixture codebooks are used, a ML or MAP estimate using EM is a well-known e cient approach (Por... |

726 | Bayesian interpolation
- MacKay
- 1992
(Show Context)
Citation Context ...the observation that, under certain conditions, the quantity p(` s jS) 1 p(Dj` s ; S) converges to a multivariate Gaussian distribution as the sample size increases (see, e.g., Kass et al., 1988, and =-=MacKay, 1992-=-ab). Less accurate but more efficient approximations are based on the observation that the Gaussian distribution converges to a delta function centered at the maximum-a-posteriori (MAP) and eventually... |

704 | Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,”
- Gauvain, Lee
- 1994
(Show Context)
Citation Context ...ine 1994; Heckerman and Geiger 1995). Heckerman and Geiger (1995) describe a simple method for assessing these priors. These priors have also been used for learning parameters in standard HMMs (e.g., =-=Gauvain and Lee, 1994-=-). Parameter independence is usually not assumed in general for HMM structures. For example, in the HMM(1,1) model, a standard assumption is that p(H i jH i01 ) = p(H j jH j01 ) and p(O i jH i ) = p(O... |

637 | M.I.J.: Factorial hidden markov models. - Ghahramani - 1997 |

611 |
Graphical Models in Applied Multivariate Statistics
- WHITTAKER
- 1990
(Show Context)
Citation Context ...er we restrict our attention to discrete-valued random variables, however, many of the results stated generalize directly to continuous and mixed sets of random variables (Lauritzen and Wermuth 1989��=-=� Whittaker 1990-=-). Let lower case x 1 denote one of the values of variable X 1: the notation P x1 is taken to mean the sum over all possible values of X 1.Letp(xi) be shorthand for the particular probability p(Xi = x... |

583 | Bayesian model selection in social research (with Discussion
- Raftery
- 1995
(Show Context)
Citation Context ...). The BIC score is the additive inverse of Rissanen's (1987) minimum description length (MDL). Other scores, which can be viewed as approximations to the marginal likelihood, are hypothesis testing (=-=Raftery 1995-=-) and cross validation (Fung and Crawford 1990). Buntine 2 One caveat: The BIC score is derived under the assumption that the parameter prior is positive throughout its domain. 28(in press) provides ... |

549 |
Statistical inference for probabilistic functions of finite state Markov chains. The annals of mathematical statistics,
- Baum, Petrie
- 1966
(Show Context)
Citation Context ...re in fact special cases of inference algorithms for UPINs and can be considerably less efficient (Shachter et al. 1994). 4 Modeling HMMs as PINs 4.1 PINs for HMMs In hidden Markov modeling problems (=-=Baum and Petrie 1966-=-; Poritz 1988; Rabiner 1989; Huang, Ariki, and Jack 1990; Elliott, Aggoun, and Moore 1995) we are interested in the set of random variables U = fH 1 ; O 1 ; H 2 ; O 2 ; . . . ; HN01 ; ON01 ; HN ; ON g... |

494 | A practical Bayesian framework for backpropagation networks.
- MacKay
- 1992
(Show Context)
Citation Context ...the observation that, under certain conditions, the quantity p(` s jS) 1 p(Dj` s ; S) converges to a multivariate Gaussian distribution as the sample size increases (see, e.g., Kass et al., 1988, and =-=MacKay, 1992-=-ab). Less accurate but more efficient approximations are based on the observation that the Gaussian distribution converges to a delta function centered at the maximum-a-posteriori (MAP) and eventually... |

374 | Nonuniversal critical dynamics in Monte Carlo simulations - Swedensen, Wang - 1987 |

370 | Model selection and accounting for model uncertainly in graphical models using Occam’s window. - Madigan, Raftery - 1994 |

344 | M.A.Jack: \Hidden Markov Models for Speech Recognition," - Huang - 1990 |

340 |
Learning and relearning in Boltzmann machines
- Hinton, Sejnowski
- 1986
(Show Context)
Citation Context ...IN structure. Terms used in the literature to described UPINs of one form or another include Markov random fields (Isham 1981, Geman and Geman 1984), Markov networks (Pearl 1988), Boltzmann machines (=-=Hinton and Sejnowski 1986-=-), and log-linear models (Bishop, Fienberg, & Holland 1973). 3.1.1 Conditional Independence Semantics of UPIN Structures Let A, B, and S be any disjoint subsets of nodes in an undirected graph (UG) G.... |

276 | Operations for learning with graphical models",
- Buntine
- 1994
(Show Context)
Citation Context ...e distributions include the normal-Wishart distribution for the parameters of Gaussian codebooks and the Dirichlet distribution for the mixing coefficients of Gaussianmixture codebooks (DeGroot 1970; =-=Buntine 1994-=-; Heckerman and Geiger 1995). Heckerman and Geiger (1995) describe a simple method for assessing these priors. These priors have also been used for learning parameters in standard HMMs (e.g., Gauvain ... |

244 |
Explaining phonetic variation: a sketch of H&H theory. In Speech production and speech modeling
- Lindblom
- 1990
(Show Context)
Citation Context ...ple, equivalent shifts in formant frequencies can be caused by lip-rounding or tongue-raising� such phenomena are generically refered to as \trading relations" in the speech psychophysics liter=-=ature (Lindblom 1990� -=-Perkell et al. 1993). Once a particular acoustic pattern is observed, the causes become dependent� thus for example, evidence that the lips are rounded would act to discount inferences that the tong... |

224 |
Graphical models for associations between variables, some of which are qualitative and some quantitative.
- Lauritzen, Wermuth
- 1989
(Show Context)
Citation Context ...For the purposes of this paper we restrict our attention to discrete-valued random variables, however, many of the results stated generalize directly to continuous and mixed sets of random variables (=-=Lauritzen and Wermuth 1989��-=-� Whittaker 1990). Let lower case x 1 denote one of the values of variable X 1: the notation P x1 is taken to mean the sum over all possible values of X 1.Letp(xi) be shorthand for the particular prob... |

217 | Sequential updating of conditional probabilities on directed graphical structures. - Spiegelhalter, Lauritzen - 1990 |

205 | A guide to the literature on learning probabilistic networks from data - Buntine - 1996 |

177 |
Independence properties of directed Markov fields.
- Lauritzen, Dawid, et al.
- 1990
(Show Context)
Citation Context ... complex interpretation in the directed context: S separates A from B in a directed graph if S separates A from B in the moral (undirected) graph of the smallest ancestral set containing A, B, and S (=-=Lauritzen et al. 1990-=-). It can be shown that this definition of a DPIN structure is equivalent to the more intuitive statement that, given the values of its parents, a variable X i is independent of all other nodes in the... |

158 | Hidden Markov Models: Estimation and Control”, - Elliott - 1995 |

153 |
Bayesian Updating in Recursive Graphical Models by Local Computations,
- Jensen
- 1989
(Show Context)
Citation Context ...ete one gets a new representation Kf such that the local potential on each clique is f (xC) =p(xh C�e), i.e., the joint probability of the local unobserved clique variables and the observed evidence=-= (Jensen et al. 1990) (s-=-imilarly for the separator potential functions). If one marginalizes at the clique over the unobserved local clique variables, X X h C p(x h C�e)=p(e)� (16) one gets the probability of the observe... |

117 | Exploiting tractable substructures in intractable networks. Neural Information Processing Systems. - Saul, Jordan - 1996 |

100 | Applications of a general propagation algorithm for probabilistic expert systems, - Dawid - 1992 |

84 |
Hidden Markov models: A guided tour,” in
- Poritz
- 1990
(Show Context)
Citation Context ... are in fact special cases of inference algorithms for UPINs and can be considerably less e cient (Shachter et al. 1994). 4 Modeling HMMs as PINs 4.1 PINs for HMMs In hidden Markov modeling problems (=-=Poritz 1988� Rabiner 1989) -=-we are interested in the set of random variables U = fH 1�O 1�H 2�O 2�:::�HN ;1�ON ;1�HN�ON g, whereHi is a discretevalued hidden variable at index i, and Oi is the corresponding discr... |

75 |
Maximum a-posteriori estimation for multivariate Gaussian mixture observations of Markov chains
- Gauvain, Lee
- 1994
(Show Context)
Citation Context ...normal-Wishart distribution for the parameters of Gaussian codebooks (DeGroot 1970� Buntine 1994� Heckerman and Geiger 1995). These priors have also been used in MAP estimates of standard HMMs (e.=-=g., Gauvain and Lee, 1994-=-). Heckerman and Geiger (1995) describe a simple method for assessing these priors. The use of the EM algorithm for UPINs is similar. Suppose that the undirected model M consists of cliques Cij such t... |

69 | Learning Bayesian networks with discrete variables from data. - Spirtes, Meek - 1995 |

60 | Boltzmann Chains and Hidden markov models
- Saul, Jordan
- 1995
(Show Context)
Citation Context ...istence of the JLO algorithm frees us from having to derive particular recursive algorithms on a case-by-case basis. The rst model that we consider can be viewed as a coupling of two HMM(1,1) chains (=-=Saul & Jordan, 1995-=-). Such a model can be useful in general sensor fusion problems, for example in the fusion of an audio signal with a video signal in lipreading. Because di erent sensory signals generally have di eren... |

55 | Coarticulation in recent speech production models. - Kent, Minifie, et al. - 1977 |

54 |
Prequential analysis, stochastic complexity and Bayesian inference. In
- Dawid
- 1992
(Show Context)
Citation Context ...erse of Rissanen's (1987) minimum description length (MDL). Other scores, which can be viewed as approximations to the marginal likelihood, are hypothesis testing (Raftery 1995) and cross validation (=-=Dawid 1992-=-b). Buntine (in press) provides a comprehensive review of scores for model selection and model averaging in the context of PINs. Another complication with Bayesian model averaging is that there may be... |

50 | A Markov Random Field Model-Based Approach to Image Interpretation - Modestino, Zhang - 1992 |

46 | Global conditioning for probabilistic inference in belief networks. - Shachter, Andersen, et al. - 1994 |

45 |
Hidden Markov Models for Fault Detection in Dynamic Systems,”
- Smyth
- 1994
(Show Context)
Citation Context ...ticular hidden state value given the observed evidence. Inferring the posterior state probabilities is useful when the states have direct physical interpretations (as in fault monitoring applictions (=-=Smyth 1994-=-)) and is also implicitly required during the standard Baum-Welch learning algorithm for HMM(1,1). In general, both of these computations scale as m N where m is the number of states for each hidden v... |

43 | Constructor: A system for the induction of probabilistic models
- Fung, Crawford
- 1990
(Show Context)
Citation Context ...erse of Rissanen's (1987) minimum description length (MDL). Other scores, which can be viewed as approximations to the marginal likelihood, are hypothesis testing (Raftery 1995) and cross validation (=-=Fung and Crawford 1990).-=- Buntine (in press) provides a comprehensive review of the literature on learning PINs. In the context of HMM(K� J) type structures, an obvious question is how one could learn such structure from da... |

41 | Stochastic Complexity (with discussion). - Rissanen - 1987 |

37 |
Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: A pilot “motor equivalence” study.
- Perkell, Matthies, et al.
- 1993
(Show Context)
Citation Context ... shifts in formant frequencies can be caused by lip-rounding or tongue-raising; such phenomena are generically refered to as \trading relations" in the speech psychophysics literature (Lindblom 1990; =-=Perkell et al. 1993-=-). Once a particular acoustic pattern is observed, the causes become dependent; thus for example, evidence that the lips are rounded would act to discount inferences that the tongue has been raised. T... |

31 | The logic of influence diagrams - Pearl, Geiger, et al. |

25 | On the effective implementation of the iterative proportional fitting procedure. - Jirousek, Preucil - 1995 |

25 |
Asymptotics in bayesian computation
- Kass, Tierney, et al.
- 1988
(Show Context)
Citation Context ...icient is one based on the observation that, under certain conditions, the quantity p(` s jS) 1 p(Dj` s ; S) converges to a multivariate Gaussian distribution as the sample size increases (see, e.g., =-=Kass et al., 1988-=-, and MacKay, 1992ab). Less accurate but more efficient approximations are based on the observation that the Gaussian distribution converges to a delta function centered at the maximum-a-posteriori (M... |

23 |
An introduction to spatial point processes and Markov random fields,"
- Isham
- 1981
(Show Context)
Citation Context ... clique functions. The clique functions 4 X 3 VC X5srepresent the particular parameters associated with the UPIN structure. This corresponds directly to the standard de nition of a Markov random eld (=-=Isham 1981). Th-=-e clique functions re ect the relative \compatibility" of the value assignments in the clique. A model p is said to be decomposable if it has a minimal UPIN structure G which is triangulated (Fig... |

20 |
Independence properties of directed Markov ®elds.
- Lauritzen, Dawid, et al.
- 1990
(Show Context)
Citation Context ... di erent interpretation in the directed context: S separates A from B in a directed graph if S separates A from B in the moral (undirected) graph of the smallest ancestral set containing A, B, andS (=-=Lauritzen et al. 1990-=-). It can be shown that this is equivalent to the statement that a variable Xi is independent of all other nodes in the graph except for its descendants, given the values of its parents. Thus, as with... |

17 | Mean field networks that learn to discriminate temporally distorted strings - Williams, Hinton - 1991 |

16 | Bayesian belief networks as a tool for stochastic parsing. - Lucke - 1995 |

14 | Score and information for recursive exponential models with incomplete data
- Thiesson
- 1995
(Show Context)
Citation Context ...dard assumption is that p(H i jH i01 ) = p(H j jH j01 ) and p(O i jH i ) = p(O j jH j ) for all i and j. Fortunately, parameter equalities such as these are easily handled in the framework above (see =-=Thiesson, 1995-=-, for a detailed discussion). In addition the assumption that patterns are complete is clearly inappropriate for HMM structures in general, where some of the variables are hidden from observation. Whe... |

14 | On the effective implementation of the iterative proportional fitting procedure. Computational Statistics and Data Analysis 19, 177–189. Padhraic - Jiˇrousek, Pˇreučil - 1995 |

9 |
Likelihoods and priors for Bayesian networks.
- Heckerman, Geiger
- 1995
(Show Context)
Citation Context ...the parameters of discrete variables and the mixing coe cients of Gaussian-mixture codebooks, and the normal-Wishart distribution for the parameters of Gaussian codebooks (DeGroot 1970� Buntine 1994=-=� Heckerman and Geiger 1995-=-). These priors have also been used in MAP estimates of standard HMMs (e.g., Gauvain and Lee, 1994). Heckerman and Geiger (1995) describe a simple method for assessing these priors. The use of the EM ... |

6 | An EM approach to grammatical inference: Input/output HMMs - Frasconi, Bengio - 1994 |

5 | Continuous voxel classification by stochastic relaxation: Theory and application to MR imaging and MR angiography - Vandermeulen, Verbeeck, et al. - 1994 |

4 |
The logic of in uence diagrams
- Pearl, Geiger, et al.
- 1990
(Show Context)
Citation Context ...ing all the directions on the edges in the DPIN structure? The answer is yes if and only if the DPIN structure contains no subgraphs where a node has two or more non-adjacent parents (Whittaker 1990��=-=� Pearl et al. 1990-=-). In general, it can be shown that if a UPIN structure G 7 X 4sfor p is decomposable (triangulated) then it has the same Markov properties as some DPIN structure for p. On a more practical level, DPI... |

4 |
Global conditioning for prababilistic inference in belief networks
- Shachter, Anderson, et al.
- 1994
(Show Context)
Citation Context ...ith the original DPIN. Furthermore, it has been shown that many of the inference algorithms for DPINs are in fact special cases of inference algorithms for UPINs and can be considerably less e cient (=-=Shachter et al. 1994). 4 Modeling HMMs-=- as PINs 4.1 PINs for HMMs In hidden Markov modeling problems (Poritz 1988� Rabiner 1989) we are interested in the set of random variables U = fH 1�O 1�H 2�O 2�:::�HN ;1�ON ;1�HN�ON ... |

4 |
A Generalization Of Discrete Hidden Markov Model And Of Viterbi Algorithm
- Tao
- 1992
(Show Context)
Citation Context ...lgorithms are more general than the F-B and Viterbi algorithms: 1. While special purpose extensions to the standard Viterbi and F-B algorithms can be derived to handle various extensions to HMM(1,1) (=-=Tao 1992-=-), the JLO algorithms provide by de nition a completely general exact inference method for any PIN. 2. The graphical algorithms can easily handle other inference tasks besides just calculating the lik... |

4 |
Magnetic correlations in two-dimensional spin glasses
- Morgenstern, Binder
- 1980
(Show Context)
Citation Context ..."decorated" variations on chains and trees have been studied for many years (see, e.g., Itzykson and Drouff'e, 1991). The general methods developed there, notably the transfer matrix formali=-=sm (e.g., Morgenstern and Binder, 1983-=-), support exact calculations on general undirected graphs. The transfer matrix recursions and the calculations in the JLO algorithm are closely related and a reasonable hypothesis is that they are eq... |

4 | Mean eld networks that learn to discriminate temporally distorted strings - Williams, Hinton - 1991 |

3 |
Recursive approaches to the statistical physics of lattice proteins
- Stolorz
- 1994
(Show Context)
Citation Context ... di erent in the two cases. Saul and Jordan (1996) have proposed a second extension of the HMM(1,1) model which is motivated by the desire to provide a more e ective model of coarticulation (see also =-=Stolorz, 1994-=-). In this model, shown in Figure 10, coarticulatory in uences are modeled via additional links between output variables and states along an HMM(1,1) backbone. One approach to performing calculations ... |

2 | On the e ective implementation of the iterative proportional tting procedure - Jirousek, Preucil - 1995 |

2 | A markov random eld model-based approach to image segmentation - Modestino, Zhang - 1992 |

2 |
Probabilistic expert systems and graphical modelling: A case study in drug safety
- Spiegelhalter, Dawid, et al.
- 1991
(Show Context)
Citation Context ...Their popularity in these elds stems from the fact that the joint probability model can be speci ed directly via Equation 3, i.e., via the speci cation of conditional probability tables or functions (=-=Spiegelhalter et al. 1991-=-). In contrast, UPINs must be speci ed in terms of clique functions (as in Equation 1) which may not be as easy to work with (cf. Geman and Geman (1984), Modestino and Zhang (1992) and Vandermeulen et... |

2 | Continuous voxel classi cation by stochastic relaxation: theory and application to mr imaging and mr angiography - Vandermeulen, Verbeeck, et al. - 1994 |

2 |
The Logic of Influence Diagrams, in: Influence Diagrams, Belief Nets and Decision
- Pearl, Geiger, et al.
- 1990
(Show Context)
Citation Context ...ping all the directions on the edges in the DPIN structure? The answer is yes if and only if the DPIN structure contains no subgraphs where a node has two or more nonadjacent parents (Whittaker 1990; =-=Pearl et al. 1990-=-). In general, it can be shown that if a UPIN structure G for p is decomposable (triangulated) then it has the same Markov properties as some DPIN structure for p. On a more practical level, DPIN stru... |

1 |
Operations for learning with graphical models. Journal of Arti cial Intelligence Research. 2 159{225. Buntine, W. in press. A guide to the literature on learning probabilistic networks from data
- Buntine
- 1994
(Show Context)
Citation Context ...tribution for the parameters of discrete variables and the mixing coe cients of Gaussian-mixture codebooks, and the normal-Wishart distribution for the parameters of Gaussian codebooks (DeGroot 1970��=-=� Buntine 1994��-=-� Heckerman and Geiger 1995). These priors have also been used in MAP estimates of standard HMMs (e.g., Gauvain and Lee, 1994). Heckerman and Geiger (1995) describe a simple method for assessing these... |

1 | Coarticulation in recentspeech production models - Mini, D - 1977 |

1 | properties of directed Markov elds. Networks. 20, 491{505. Independence 35 - unknown authors - 1990 |