Results 1 - 10
of
405
An introduction to variational methods for graphical models
- TO APPEAR: M. I. JORDAN, (ED.), LEARNING IN GRAPHICAL MODELS
"... ..."
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have bee ..."
Abstract
-
Cited by 770 (3 self)
- Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying Rao-Blackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Good Error-Correcting Codes based on Very Sparse Matrices
, 1999
"... We study two families of error-correcting codes defined in terms of very sparse matrices. "MN" (MacKay--Neal) codes are recently invented, and "Gallager codes" were first investigated in 1962, but appear to have been largely forgotten, in spite of their excellent properties. The ..."
Abstract
-
Cited by 750 (23 self)
- Add to MetaCart
We study two families of error-correcting codes defined in terms of very sparse matrices. "MN" (MacKay--Neal) codes are recently invented, and "Gallager codes" were first investigated in 1962, but appear to have been largely forgotten, in spite of their excellent properties. The decoding of both codes can be tackled with a practical sum-product algorithm. We prove that these codes are "very good," in that sequences of codes exist which, when optimally decoded, achieve information rates up to the Shannon limit. This result holds not only for the binary-symmetric channel but also for any channel with symmetric stationary ergodic noise. We give experimental results for binary-symmetric channels and Gaussian channels demonstrating that practical performance substantially better than that of standard convolutional and concatenated codes can be achieved; indeed, the performance of Gallager codes is almost as close to the Shannon limit as that of turbo codes.
Loopy belief propagation for approximate inference: An empirical study. In:
- Proceedings of Uncertainty in AI,
, 1999
"... Abstract Recently, researchers have demonstrated that "loopy belief propagation" -the use of Pearl's polytree algorithm in a Bayesian network with loops -can perform well in the context of error-correcting codes. The most dramatic instance of this is the near Shannon-limit performanc ..."
Abstract
-
Cited by 676 (15 self)
- Add to MetaCart
(Show Context)
Abstract Recently, researchers have demonstrated that "loopy belief propagation" -the use of Pearl's polytree algorithm in a Bayesian network with loops -can perform well in the context of error-correcting codes. The most dramatic instance of this is the near Shannon-limit performance of "Turbo Codes" -codes whose decoding algorithm is equivalent to loopy belief propagation in a chain-structured Bayesian network. In this paper we ask: is there something spe cial about the error-correcting code context, or does loopy propagation work as an ap proximate inference scheme in a more gen eral setting? We compare the marginals com puted using loopy propagation to the exact ones in four Bayesian network architectures, including two real-world networks: ALARM and QMR. We find that the loopy beliefs of ten converge and when they do, they give a good approximation to the correct marginals. However, on the QMR network, the loopy be liefs oscillated and had no obvious relation ship to the correct posteriors. We present some initial investigations into the cause of these oscillations, and show that some sim ple methods of preventing them lead to the wrong results. Introduction The task of calculating posterior marginals on nodes in an arbitrary Bayesian network is known to be NP hard In this paper we investigate the approximation performance of "loopy belief propagation". This refers to using the well-known Pearl polytree algorithm [12] on a Bayesian network with loops (undirected cycles). The algorithm is an exact inference algorithm for singly connected networks -the beliefs converge to the cor rect marginals in a number of iterations equal to the diameter of the graph.1 However, as Pearl noted, the same algorithm will not give the correct beliefs for mul tiply connected networks: When loops are present, the network is no longer singly connected and local propaga tion schemes will invariably run into trouble . We believe there are general undiscovered theorems about the performance of belief propagation on loopy DAGs. These theo rems, which may have nothing directly to do with coding or decoding will show that in some sense belief propagation "converges with high probability to a near-optimum value" of the desired belief on a class of loopy DAGs Progress in the analysis of loopy belief propagation has been made for the case of networks with a single loop • Unless all the conditional probabilities are deter ministic, belief propagation will converge. • There is an analytic expression relating the cor rect marginals to the loopy marginals. The ap proximation error is related to the convergence rate of the messages -the faster the convergence the more exact the approximation. • If the hidden nodes are binary, then thresholding the loopy beliefs is guaranteed to give the most probable assignment, even though the numerical value of the beliefs may be incorrect. This result only holds for nodes in the loop. In the max-product (or "belief revision") version, Weiss For the case of networks with multiple loops, Richard son To summarize, what is currently known about loopy propagation is that ( 1) it works very well in an error correcting code setting and (2) there are conditions for a single-loop network for which it can be guaranteed to work well. In this paper we investigate loopy prop agation empirically under a wider range of conditions. Is there something special about the error-correcting code setting, or does loopy propagation work as an approximation scheme for a wider range of networks? ..\ x(:x).) (1) where: and: The message X passes to its parent U; is given by: and the message X sends to its child Y j is given by: k;Cj For noisy-or links between parents and children, there exists an analytic expression for 1r( x) and Ax ( u;) that avoids the exhaustive enumeration over parent config urations We made a slight modification to the update rules in that we normalized both ..\ and 1r messages at each iteration. As Pearl Nodes were updated in parallel: at each iteration all nodes calculated their outgoing messages based on the incoming messages of their neighbors from the pre vious iteration. The messages were said to converge if none of the beliefs in successive iterations changed by more than a small threshold (10-4). All messages were initialized to a vector of ones; random initializa tion yielded similar results, since the initial conditions rapidly get "washed out" . For comparison, we also implemented likelihood weighting 3.1 The PYRAMID network All nodes were binary and the conditional probabilities were represented by tables-entries in the conditional probability tables (CPTs) were chosen uniformly in the range (0, 1]. 3.2 The toyQMR network All nodes were binary and the conditional probabilities of the leaves were represented by a noisy-or: ? (Child= OIParents) = e-Bo-L; B,Paren t; where 110 represents the "leak" term. The QMR-DT network The QMR-DT is a bipartite network whose structure is the same as that shown in figure 2 but the size is much larger. There are approximately 600 diseases and ap proximately 4000 findin nodes, with a number of ob served findings that varies per case. Due to the form of the noisy-or CPTs the complexity of inference is ex ponential in the number of positive findings Results Initial experiments The experimental protocol for the PYRAMID network was as follows. For each experimental run, we first gen erated random CPTs. We then sampled from the joint distribution defined by the network and clamped the observed nodes (all nodes in the bottom layer) to their sampled value. Given a structure and observations, we then ran three inference algorithms -junction tree, loopy belief propagation and sampling. We found that loopy belief propagation always con verged in this case with the average number of iter ations equal to 10.2. The experimental protocol for the toyQMR network was similar to that of the PYRAMID network except that we randomized over structure as well. Again we found that loopy belief propagation always converged, with the average number of iterations equal to 8.65. The protocol for the ALARM network experiments dif fered from the previous two in that the structure and parameters were fixed -only the observed evidence differed between experimental runs. We assumed that all leaf nodes were observed and calculated the pos- Figure 2: The structure of a toyQMR network. This is a bipartite structure where the conditional distributions of the leaves are noisy-or's. The network shown represents one sample from randomly generated structures where the parents of each symptom were a random subset of the diseases. terior marginals of all other nodes. Again we found that loopy belief propagation always converged with the average number of iterations equal to 14.55. The results presented up until now show that loopy propagation performs well for a variety of architectures involving multiple loops. We now present results for the QMR-DT network which are not as favorable. In the QMR-DT network there was no randomization. We used the fixed structure and calculated posteriors for the four cases for which posteriors have been cal culated exactly by Heckerman What causes convergence versus oscill ation? What our initial experiments show is that loopy prop agation does a good job of approximating the correct posteriors if it converges. Unfortunately, on the most challenging case-the QMR-DT network-the al gorithm did not converge. We wanted to see if this oscillatory behavior in the QMR-DT case was related to the size of the network -does loopy propagation tend to converge less for large networks than small networks? To investigate this question, we tried to cause oscil lation in the toyQMR network. We first asked what, besides the size, is different between toyQMR and real QMR? An obvious difference is in the parameter val ues -while the CPTs for toyQMR are random, the real QMR parameters are not. In particular, the prior probability of a disease node being on is extremely low in the real QMR (typically of the order of 10-3 ). Would low priors cause oscillations in the toyQMR case? To answer this question we repeated the ex periments reported in the previous section but rather than having the prior probability of each node be ran domly selected in the range [0, 1] we selected the prior uniformly in the range [0, U] and varied U. Unlike the previous simulations we did not set the observed nodes by sampling from the joint -for low priors all the findings would be negative and inference would be trivial. Rather each finding was independently set to positive or negative. If indeed small priors are responsible for the oscilla tion, then we would expect the real QMR network to converge if the priors were sampled randomly in the range [0, Small priors are not the only thing that causes oscil lation. Small weights can, too. The effect of both The exact marginals are represented by the circles; the ends of the "error bars" represent the loopy marginals at the last two iterations. We only plot the diseases which had non-negligible posterior probability. Loopy Belief Propagation . s---=-o� . a-----' range of prior To test this hypothesis, we reparameterized the pyra mid network as follows: we set the prior probability of the "1" state of the root nodes to 0.9, and we utilized the noisy-OR model for the other nodes with a small (0.1) inhibition probability (apart from the leak term, which we inhibited with probability 0.9). This param eterization has the effect of propagating 1 's from the top layer to the bottom. Thus the true marginal at each leaf is approximately (0.1, 0.9), i.e., the leaf is 1 with high probability. We then generated untypical evidence at the leaves by sampling from the uniform distribution, (0.5, 0.5), or from the skewed distribu tion (0.9, 0. 1). We found that loopy propagation still converged2, and that, as before, the marginals to which it converged were highly correlated with the correct marginals. Thus there must be some other explana tion, besides untypicality of the evidence, for the os cillations observed in QMR. Can we fix oscillations easily? When loopy propagation oscillates between two steady states it seems reasonable to try to find a way to com bine the two values. The simplest thing to do is to average them. Unfortunately, this gave very poor re sults, since the correct posteriors do not usually lie in the midpoint of the interval ( cf. 2More precisely, we found that with a convergence threshold of 10-4 , 98 out of 100 cases converged; when we lowered the threshold to 10-3 , all 100 cases converged. We also tried to avoid oscillations by using "momen tum"; replacing the messages that were sent at time t with a weighted average of the messages at times t and t-1. That is, we replaced the reference to >.� ) in and similarly for 11"�) in Equation 3, where 0 :::; J.l :::; 1 is the momentum term. It is easy to show that if the modified system of equations converges to a fixed point F, then F is also a fixed point of the original system (since if>.� ) = >.�-1) , then Equation 7 yields>.� ) ). In the experiments for which loopy propagation con verged (PYRAMID, toyQMR and ALARM), we found that adding the momentum term did not change the results -the beliefs that resulted were the same be liefs found without momentum. In the experiments which did not converge (toyQMR with small priors and real QMR), we found that momentum significantly reduced the chance of oscillation. However, in several cases the beliefs to which the algorithm converged were quite inaccurate-see Discussion The experimental results presented here suggest that loopy propagation can yield accurate posterior marginals in a more general setting than that of error correcting coding -the PYRAMID, toyQMR and ALARM networks are quite different from the error correcting coding graphs yet the loopy beliefs show high correlation with the correct marginals. In error-correcting codes the posterior is typically highly peaked and one might think that this feature is necessary for the good performance of loopy prop agation. Our results suggest that is not the case - in none of our simulations were the posteriors highly peaked around a single joint configuration. If the prob ability mass was concentrated at a single point the marginal probabilities should all be near zero or one; this is clearly not the case as can be seen in the figures. It might be expected that loopy propagation would only work well for graphs with large loops. However, our results, and previous results on turbo codes, show that loopy propagation can also work well for graphs with many small loops. At the same time, our experimental results suggest a cautionary note about loopy propagation, showing that the marginals may exhibit oscillations that have very little correlation with the correct marginals. We presented some preliminary results investigating the cause of the oscillations and showed that it is not sim ply a matter of the size of the network or the number of parents. Rather the same structure with different parameter values may oscillate or exhibit stable be havior. For all our simulations, we found that when loopy propagation converges, it gives a surprisingly good ap proximation to the correct marginals. Since the dis tinction between convergence and oscillation is easy to make after a small number of iterations, this may sug gest a way of checking whether loopy propagation is appropriate for a given problem. Acknowl edgements We thank Tommi Jaakkola, David Heckerman and David MacKay for useful discussions. We also thank Randy Miller and the University of Pittsburgh for the use of the QMR-DT database. Supported by MURI ARO DAAH04-96-1-0341. algorithm. These approaches are guaranteed to find local maxima, but do not explore the landscape for other modes. Our approach evolves structure and the missing data. We compare our stochastic algorithms and show they all produce accurate results.
Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms
- IEEE Transactions on Information Theory
, 2005
"... Important inference problems in statistical physics, computer vision, error-correcting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems t ..."
Abstract
-
Cited by 585 (13 self)
- Add to MetaCart
(Show Context)
Important inference problems in statistical physics, computer vision, error-correcting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems that is exact when the factor graph is a tree, but only approximate when the factor graph has cycles. We show that BP fixed points correspond to the stationary points of the Bethe approximation of the free energy for a factor graph. We explain how to obtain regionbased free energy approximations that improve the Bethe approximation, and corresponding generalized belief propagation (GBP) algorithms. We emphasize the conditions a free energy approximation must satisfy in order to be a “valid ” or “maxent-normal ” approximation. We describe the relationship between four different methods that can be used to generate valid approximations: the “Bethe method, ” the “junction graph method, ” the “cluster variation method, ” and the “region graph method.” Finally, we explain how to tell whether a region-based approximation, and its corresponding GBP algorithm, is likely to be accurate, and describe empirical results showing that GBP can significantly outperform BP.
Learning low-level vision
- International Journal of Computer Vision
, 2000
"... We show a learning-based method for low-level vision problems. We set-up a Markov network of patches of the image and the underlying scene. A factorization approximation allows us to easily learn the parameters of the Markov network from synthetic examples of image/scene pairs, and to e ciently prop ..."
Abstract
-
Cited by 579 (30 self)
- Add to MetaCart
(Show Context)
We show a learning-based method for low-level vision problems. We set-up a Markov network of patches of the image and the underlying scene. A factorization approximation allows us to easily learn the parameters of the Markov network from synthetic examples of image/scene pairs, and to e ciently propagate image information. Monte Carlo simulations justify this approximation. We apply this to the \super-resolution " problem (estimating high frequency details from a low-resolution image), showing good results. For the motion estimation problem, we show resolution of the aperture problem and lling-in arising from application of the same probabilistic machinery.
Belief Propagation
, 2010
"... When a pair of nuclear-powered Russian submarines was reported patrolling off the eastern seaboard of the U.S. last summer, Pentagon officials expressed wariness over the Kremlin’s motivations. At the same time, these officials emphasized their confidence in the U.S. Navy’s tracking capabilities: “W ..."
Abstract
-
Cited by 474 (11 self)
- Add to MetaCart
When a pair of nuclear-powered Russian submarines was reported patrolling off the eastern seaboard of the U.S. last summer, Pentagon officials expressed wariness over the Kremlin’s motivations. At the same time, these officials emphasized their confidence in the U.S. Navy’s tracking capabilities: “We’ve known where they were,” a senior Defense Department official told the New York Times, “and we’re not concerned about our ability to track the subs.” While the official did not divulge the methods used by the Navy to track submarines, the Times added that such
The generalized distributive law
- Information Theory, IEEE Transactions on
"... Abstract—In this semitutorial paper we discuss a general message passing algorithm, which we call the generalized dis-tributive law (GDL). The GDL is a synthesis of the work of many authors in the information theory, digital communications, signal processing, statistics, and artificial intelligence ..."
Abstract
-
Cited by 359 (2 self)
- Add to MetaCart
(Show Context)
Abstract—In this semitutorial paper we discuss a general message passing algorithm, which we call the generalized dis-tributive law (GDL). The GDL is a synthesis of the work of many authors in the information theory, digital communications, signal processing, statistics, and artificial intelligence communities. It includes as special cases the Baum–Welch algorithm, the fast Fourier transform (FFT) on any finite Abelian group, the Gal-lager–Tanner–Wiberg decoding algorithm, Viterbi’s algorithm, the BCJR algorithm, Pearl’s “belief propagation ” algorithm, the Shafer–Shenoy probability propagation algorithm, and the turbo decoding algorithm. Although this algorithm is guaranteed to give exact answers only in certain cases (the “junction tree ” condition), unfortunately not including the cases of GTW with cycles or turbo decoding, there is much experimental evidence, and a few theorems, suggesting that it often works approximately even when it is not supposed to. Index Terms—Belief propagation, distributive law, graphical models, junction trees, turbo codes. I.
Correctness of belief propagation in Gaussian graphical models of arbitrary topology
- NEURAL COMPUTATION
, 1999
"... Local "belief propagation" rules of the sort proposed byPearl [12] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have empirically demonstrated good performance of "loopy belief propagation&q ..."
Abstract
-
Cited by 296 (7 self)
- Add to MetaCart
Local "belief propagation" rules of the sort proposed byPearl [12] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have empirically demonstrated good performance of "loopy belief propagation" -- using these same rules on graphs with loops. Perhaps the most dramatic instance is the near Shannonlimit performance of "Turbo codes", whose decoding algorithm is equivalentto loopy belief propagation. Except for the
The Bayes Net Toolbox for MATLAB
- Computing Science and Statistics
, 2001
"... The Bayes Net Toolbox (BNT) is an open-source Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models. BNT is widely used in teaching and research: the ..."
Abstract
-
Cited by 250 (1 self)
- Add to MetaCart
(Show Context)
The Bayes Net Toolbox (BNT) is an open-source Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models. BNT is widely used in teaching and research: the web page has received over 28,000 hits since May 2000. In this paper, we discuss a broad spectrum of issues related to graphical models (directed and undirected), and describe, at a high-level, how BNT was designed to cope with them all. We also compare BNT to other software packages for graphical models, and to the nascent OpenBayes effort.