Results 1  10
of
712
Loopy belief propagation for approximate inference: An empirical study. In:
 Proceedings of Uncertainty in AI,
, 1999
"... Abstract Recently, researchers have demonstrated that "loopy belief propagation" the use of Pearl's polytree algorithm in a Bayesian network with loops can perform well in the context of errorcorrecting codes. The most dramatic instance of this is the near Shannonlimit performanc ..."
Abstract

Cited by 674 (15 self)
 Add to MetaCart
(Show Context)
Abstract Recently, researchers have demonstrated that "loopy belief propagation" the use of Pearl's polytree algorithm in a Bayesian network with loops can perform well in the context of errorcorrecting codes. The most dramatic instance of this is the near Shannonlimit performance of "Turbo Codes" codes whose decoding algorithm is equivalent to loopy belief propagation in a chainstructured Bayesian network. In this paper we ask: is there something spe cial about the errorcorrecting code context, or does loopy propagation work as an ap proximate inference scheme in a more gen eral setting? We compare the marginals com puted using loopy propagation to the exact ones in four Bayesian network architectures, including two realworld networks: ALARM and QMR. We find that the loopy beliefs of ten converge and when they do, they give a good approximation to the correct marginals. However, on the QMR network, the loopy be liefs oscillated and had no obvious relation ship to the correct posteriors. We present some initial investigations into the cause of these oscillations, and show that some sim ple methods of preventing them lead to the wrong results. Introduction The task of calculating posterior marginals on nodes in an arbitrary Bayesian network is known to be NP hard In this paper we investigate the approximation performance of "loopy belief propagation". This refers to using the wellknown Pearl polytree algorithm [12] on a Bayesian network with loops (undirected cycles). The algorithm is an exact inference algorithm for singly connected networks the beliefs converge to the cor rect marginals in a number of iterations equal to the diameter of the graph.1 However, as Pearl noted, the same algorithm will not give the correct beliefs for mul tiply connected networks: When loops are present, the network is no longer singly connected and local propaga tion schemes will invariably run into trouble . We believe there are general undiscovered theorems about the performance of belief propagation on loopy DAGs. These theo rems, which may have nothing directly to do with coding or decoding will show that in some sense belief propagation "converges with high probability to a nearoptimum value" of the desired belief on a class of loopy DAGs Progress in the analysis of loopy belief propagation has been made for the case of networks with a single loop • Unless all the conditional probabilities are deter ministic, belief propagation will converge. • There is an analytic expression relating the cor rect marginals to the loopy marginals. The ap proximation error is related to the convergence rate of the messages the faster the convergence the more exact the approximation. • If the hidden nodes are binary, then thresholding the loopy beliefs is guaranteed to give the most probable assignment, even though the numerical value of the beliefs may be incorrect. This result only holds for nodes in the loop. In the maxproduct (or "belief revision") version, Weiss For the case of networks with multiple loops, Richard son To summarize, what is currently known about loopy propagation is that ( 1) it works very well in an error correcting code setting and (2) there are conditions for a singleloop network for which it can be guaranteed to work well. In this paper we investigate loopy prop agation empirically under a wider range of conditions. Is there something special about the errorcorrecting code setting, or does loopy propagation work as an approximation scheme for a wider range of networks? ..\ x(:x).) (1) where: and: The message X passes to its parent U; is given by: and the message X sends to its child Y j is given by: k;Cj For noisyor links between parents and children, there exists an analytic expression for 1r( x) and Ax ( u;) that avoids the exhaustive enumeration over parent config urations We made a slight modification to the update rules in that we normalized both ..\ and 1r messages at each iteration. As Pearl Nodes were updated in parallel: at each iteration all nodes calculated their outgoing messages based on the incoming messages of their neighbors from the pre vious iteration. The messages were said to converge if none of the beliefs in successive iterations changed by more than a small threshold (104). All messages were initialized to a vector of ones; random initializa tion yielded similar results, since the initial conditions rapidly get "washed out" . For comparison, we also implemented likelihood weighting 3.1 The PYRAMID network All nodes were binary and the conditional probabilities were represented by tablesentries in the conditional probability tables (CPTs) were chosen uniformly in the range (0, 1]. 3.2 The toyQMR network All nodes were binary and the conditional probabilities of the leaves were represented by a noisyor: ? (Child= OIParents) = eBoL; B,Paren t; where 110 represents the "leak" term. The QMRDT network The QMRDT is a bipartite network whose structure is the same as that shown in figure 2 but the size is much larger. There are approximately 600 diseases and ap proximately 4000 findin nodes, with a number of ob served findings that varies per case. Due to the form of the noisyor CPTs the complexity of inference is ex ponential in the number of positive findings Results Initial experiments The experimental protocol for the PYRAMID network was as follows. For each experimental run, we first gen erated random CPTs. We then sampled from the joint distribution defined by the network and clamped the observed nodes (all nodes in the bottom layer) to their sampled value. Given a structure and observations, we then ran three inference algorithms junction tree, loopy belief propagation and sampling. We found that loopy belief propagation always con verged in this case with the average number of iter ations equal to 10.2. The experimental protocol for the toyQMR network was similar to that of the PYRAMID network except that we randomized over structure as well. Again we found that loopy belief propagation always converged, with the average number of iterations equal to 8.65. The protocol for the ALARM network experiments dif fered from the previous two in that the structure and parameters were fixed only the observed evidence differed between experimental runs. We assumed that all leaf nodes were observed and calculated the pos Figure 2: The structure of a toyQMR network. This is a bipartite structure where the conditional distributions of the leaves are noisyor's. The network shown represents one sample from randomly generated structures where the parents of each symptom were a random subset of the diseases. terior marginals of all other nodes. Again we found that loopy belief propagation always converged with the average number of iterations equal to 14.55. The results presented up until now show that loopy propagation performs well for a variety of architectures involving multiple loops. We now present results for the QMRDT network which are not as favorable. In the QMRDT network there was no randomization. We used the fixed structure and calculated posteriors for the four cases for which posteriors have been cal culated exactly by Heckerman What causes convergence versus oscill ation? What our initial experiments show is that loopy prop agation does a good job of approximating the correct posteriors if it converges. Unfortunately, on the most challenging casethe QMRDT networkthe al gorithm did not converge. We wanted to see if this oscillatory behavior in the QMRDT case was related to the size of the network does loopy propagation tend to converge less for large networks than small networks? To investigate this question, we tried to cause oscil lation in the toyQMR network. We first asked what, besides the size, is different between toyQMR and real QMR? An obvious difference is in the parameter val ues while the CPTs for toyQMR are random, the real QMR parameters are not. In particular, the prior probability of a disease node being on is extremely low in the real QMR (typically of the order of 103 ). Would low priors cause oscillations in the toyQMR case? To answer this question we repeated the ex periments reported in the previous section but rather than having the prior probability of each node be ran domly selected in the range [0, 1] we selected the prior uniformly in the range [0, U] and varied U. Unlike the previous simulations we did not set the observed nodes by sampling from the joint for low priors all the findings would be negative and inference would be trivial. Rather each finding was independently set to positive or negative. If indeed small priors are responsible for the oscilla tion, then we would expect the real QMR network to converge if the priors were sampled randomly in the range [0, Small priors are not the only thing that causes oscil lation. Small weights can, too. The effect of both The exact marginals are represented by the circles; the ends of the "error bars" represent the loopy marginals at the last two iterations. We only plot the diseases which had nonnegligible posterior probability. Loopy Belief Propagation . s=o� . a' range of prior To test this hypothesis, we reparameterized the pyra mid network as follows: we set the prior probability of the "1" state of the root nodes to 0.9, and we utilized the noisyOR model for the other nodes with a small (0.1) inhibition probability (apart from the leak term, which we inhibited with probability 0.9). This param eterization has the effect of propagating 1 's from the top layer to the bottom. Thus the true marginal at each leaf is approximately (0.1, 0.9), i.e., the leaf is 1 with high probability. We then generated untypical evidence at the leaves by sampling from the uniform distribution, (0.5, 0.5), or from the skewed distribu tion (0.9, 0. 1). We found that loopy propagation still converged2, and that, as before, the marginals to which it converged were highly correlated with the correct marginals. Thus there must be some other explana tion, besides untypicality of the evidence, for the os cillations observed in QMR. Can we fix oscillations easily? When loopy propagation oscillates between two steady states it seems reasonable to try to find a way to com bine the two values. The simplest thing to do is to average them. Unfortunately, this gave very poor re sults, since the correct posteriors do not usually lie in the midpoint of the interval ( cf. 2More precisely, we found that with a convergence threshold of 104 , 98 out of 100 cases converged; when we lowered the threshold to 103 , all 100 cases converged. We also tried to avoid oscillations by using "momen tum"; replacing the messages that were sent at time t with a weighted average of the messages at times t and t1. That is, we replaced the reference to >.� ) in and similarly for 11"�) in Equation 3, where 0 :::; J.l :::; 1 is the momentum term. It is easy to show that if the modified system of equations converges to a fixed point F, then F is also a fixed point of the original system (since if>.� ) = >.�1) , then Equation 7 yields>.� ) ). In the experiments for which loopy propagation con verged (PYRAMID, toyQMR and ALARM), we found that adding the momentum term did not change the results the beliefs that resulted were the same be liefs found without momentum. In the experiments which did not converge (toyQMR with small priors and real QMR), we found that momentum significantly reduced the chance of oscillation. However, in several cases the beliefs to which the algorithm converged were quite inaccuratesee Discussion The experimental results presented here suggest that loopy propagation can yield accurate posterior marginals in a more general setting than that of error correcting coding the PYRAMID, toyQMR and ALARM networks are quite different from the error correcting coding graphs yet the loopy beliefs show high correlation with the correct marginals. In errorcorrecting codes the posterior is typically highly peaked and one might think that this feature is necessary for the good performance of loopy prop agation. Our results suggest that is not the case  in none of our simulations were the posteriors highly peaked around a single joint configuration. If the prob ability mass was concentrated at a single point the marginal probabilities should all be near zero or one; this is clearly not the case as can be seen in the figures. It might be expected that loopy propagation would only work well for graphs with large loops. However, our results, and previous results on turbo codes, show that loopy propagation can also work well for graphs with many small loops. At the same time, our experimental results suggest a cautionary note about loopy propagation, showing that the marginals may exhibit oscillations that have very little correlation with the correct marginals. We presented some preliminary results investigating the cause of the oscillations and showed that it is not sim ply a matter of the size of the network or the number of parents. Rather the same structure with different parameter values may oscillate or exhibit stable be havior. For all our simulations, we found that when loopy propagation converges, it gives a surprisingly good ap proximation to the correct marginals. Since the dis tinction between convergence and oscillation is easy to make after a small number of iterations, this may sug gest a way of checking whether loopy propagation is appropriate for a given problem. Acknowl edgements We thank Tommi Jaakkola, David Heckerman and David MacKay for useful discussions. We also thank Randy Miller and the University of Pittsburgh for the use of the QMRDT database. Supported by MURI ARO DAAH049610341. algorithm. These approaches are guaranteed to find local maxima, but do not explore the landscape for other modes. Our approach evolves structure and the missing data. We compare our stochastic algorithms and show they all produce accurate results.
Turbo decoding as an instance of Pearl’s belief propagation algorithm
 IEEE Journal on Selected Areas in Communications
, 1998
"... Abstract—In this paper, we will describe the close connection between the now celebrated iterative turbo decoding algorithm of Berrou et al. and an algorithm that has been well known in the artificial intelligence community for a decade, but which is relatively unknown to information theorists: Pear ..."
Abstract

Cited by 404 (16 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we will describe the close connection between the now celebrated iterative turbo decoding algorithm of Berrou et al. and an algorithm that has been well known in the artificial intelligence community for a decade, but which is relatively unknown to information theorists: Pearl’s belief propagation algorithm. We shall see that if Pearl’s algorithm is applied to the “belief network ” of a parallel concatenation of two or more codes, the turbo decoding algorithm immediately results. Unfortunately, however, this belief diagram has loops, and Pearl only proved that his algorithm works when there are no loops, so an explanation of the excellent experimental performance of turbo decoding is still lacking. However, we shall also show that Pearl’s algorithm can be used to routinely derive previously known iterative, but suboptimal, decoding algorithms for a number of other errorcontrol systems, including Gallager’s
Approximating probabilistic inference in Bayesian belief networks is NPhard
, 1991
"... Abstract A belief network comprises a graphical representation of dependencies between variables of a domain and a set of conditional probabilities associated with each dependency. Unless P=NP, an efficient, exact algorithm does not exist to compute probabilistic inference in belief networks. Stoch ..."
Abstract

Cited by 291 (4 self)
 Add to MetaCart
Abstract A belief network comprises a graphical representation of dependencies between variables of a domain and a set of conditional probabilities associated with each dependency. Unless P=NP, an efficient, exact algorithm does not exist to compute probabilistic inference in belief networks. Stochastic simulation methods, which often improve run times, provide an alternative to exact inference algorithms. We present such a stochastic simulation algorithm 2)BNRAS that is a randomized approximation scheme. To analyze the run time, we parameterize belief networks by the dependence value PE, which is a measure of the cumulative strengths of the belief network dependencies given background evidence E. This parameterization defines the class of fdependence networks. The run time of 2)BNRAS is polynomial when f is a polynomial function. Thus, the results of this paper prove the existence of a class of belief networks for which inference approximation is polynomial and, hence, provably faster than any exact algorithm. I.
On the Hardness of Approximate Reasoning
, 1996
"... Many AI problems, when formalized, reduce to evaluating the probability that a propositional expression is true. In this paper we show that this problem is computationally intractable even in surprisingly restricted cases and even if we settle for an approximation to this probability. We consider va ..."
Abstract

Cited by 288 (13 self)
 Add to MetaCart
Many AI problems, when formalized, reduce to evaluating the probability that a propositional expression is true. In this paper we show that this problem is computationally intractable even in surprisingly restricted cases and even if we settle for an approximation to this probability. We consider various methods used in approximate reasoning such as computing degree of belief and Bayesian belief networks, as well as reasoning techniques such as constraint satisfaction and knowledge compilation, that use approximation to avoid computational difficulties, and reduce them to modelcounting problems over a propositional domain. We prove that counting satisfying assignments of propositional languages is intractable even for Horn and monotone formulae, and even when the size of clauses and number of occurrences of the variables are extremely limited. This should be contrasted with the case of deductive reasoning, where Horn theories and theories with binary clauses are distinguished by the e...
An Algorithm for Probabilistic Planning
, 1995
"... We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of propositions representing the goal, a probability threshold, and actions whose effects depend on the executiontime state of the world and on random chance. Adoptin ..."
Abstract

Cited by 286 (20 self)
 Add to MetaCart
We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of propositions representing the goal, a probability threshold, and actions whose effects depend on the executiontime state of the world and on random chance. Adopting a probabilistic model complicates the definition of plan success: instead of demanding a plan that provably achieves the goal, we seek plans whose probability of success exceeds the threshold. In this paper, we present buridan, an implemented leastcommitment planner that solves problems of this form. We prove that the algorithm is both sound and complete. We then explore buridan's efficiency by contrasting four algorithms for plan evaluation, using a combination of analytic methods and empirical experiments. We also describe the interplay between generating plans and evaluating them, and discuss the role of search control in probabilistic planning. 3 We gratefully acknowledge the comment...
Nonparametric Belief Propagation
 IN CVPR
, 2002
"... In applications of graphical models arising in fields such as computer vision, the hidden variables of interest are most naturally specified by continuous, nonGaussian distributions. However, due to the limitations of existing inf#6F6F3 algorithms, it is of#]k necessary tof#3# coarse, ..."
Abstract

Cited by 279 (25 self)
 Add to MetaCart
In applications of graphical models arising in fields such as computer vision, the hidden variables of interest are most naturally specified by continuous, nonGaussian distributions. However, due to the limitations of existing inf#6F6F3 algorithms, it is of#]k necessary tof#3# coarse, discrete approximations to such models. In this paper, we develop a nonparametric belief propagation (NBP) algorithm, which uses stochastic methods to propagate kernelbased approximations to the true continuous messages. Each NBP message update is based on an efficient sampling procedure which can accomodate an extremely broad class of potentialf#l3]k[[z3 allowing easy adaptation to new application areas. We validate our method using comparisons to continuous BP for Gaussian networks, and an application to the stereo vision problem.
Learning Bayesian belief networks: An approach based on the MDL principle
 Computational Intelligence
, 1994
"... A new approach for learning Bayesian belief networks from raw data is presented. The approach is based on Rissanen's Minimal Description Length (MDL) principle, which is particularly well suited for this task. Our approach does not require any prior assumptions about the distribution being lear ..."
Abstract

Cited by 254 (7 self)
 Add to MetaCart
(Show Context)
A new approach for learning Bayesian belief networks from raw data is presented. The approach is based on Rissanen's Minimal Description Length (MDL) principle, which is particularly well suited for this task. Our approach does not require any prior assumptions about the distribution being learned. In particular, our method can learn unrestricted multiplyconnected belief networks. Furthermore, unlike other approaches our method allows us to tradeo accuracy and complexity in the learned model. This is important since if the learned model is very complex (highly connected) it can be conceptually and computationally intractable. In such a case it would be preferable to use a simpler model even if it is less accurate. The MDL principle o ers a reasoned method for making this tradeo. We also show that our method generalizes previous approaches based on Kullback crossentropy. Experiments have been conducted to demonstrate the feasibility of the approach. Keywords: Knowledge Acquisition � Bayes Nets � Uncertainty Reasoning. 1
The Complexity of LogicBased Abduction
, 1993
"... Abduction is an important form of nonmonotonic reasoning allowing one to find explanations for certain symptoms or manifestations. When the application domain is described by a logical theory, we speak about logicbased abduction. Candidates for abductive explanations are usually subjected to minima ..."
Abstract

Cited by 193 (28 self)
 Add to MetaCart
Abduction is an important form of nonmonotonic reasoning allowing one to find explanations for certain symptoms or manifestations. When the application domain is described by a logical theory, we speak about logicbased abduction. Candidates for abductive explanations are usually subjected to minimality criteria such as subsetminimality, minimal cardinality, minimal weight, or minimality under prioritization of individual hypotheses. This paper presents a comprehensive complexity analysis of relevant decision and search problems related to abduction on propositional theories. Our results indicate that abduction is harder than deduction. In particular, we show that with the most basic forms of abduction the relevant decision problems are complete for complexity classes at the second level of the polynomial hierarchy, while the use of prioritization raises the complexity to the third level in certain cases.
Exploiting Causal Independence in Bayesian Network Inference
 Journal of Artificial Intelligence Research
, 1996
"... A new method is proposed for exploiting causal independencies in exact Bayesian network inference. ..."
Abstract

Cited by 181 (10 self)
 Add to MetaCart
(Show Context)
A new method is proposed for exploiting causal independencies in exact Bayesian network inference.