#### DMCA

## Learning Bayesian belief networks: An approach based on the MDL principle (1994)

### Cached

### Download Links

Venue: | Computational Intelligence |

Citations: | 246 - 7 self |

### Citations

8725 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
- Pearl
- 1988
(Show Context)
Citation Context ...ch (1990) that can discover a minimal-edge I-map. A network structure is an I-map of a probability distribution if every independence relation exhibited in the network holds also in the distribution (=-=Pearl, 1988-=-; Geiger and Pearl, 1990). However, their approach is again limited to polytrees; it is only guaranteed to work in the case where the underlying distribution has a polytree structure. All of the above... |

2142 |
On information and sufficiency
- Kullback, Leibler
- 1951
(Show Context)
Citation Context ...ned tree network was the closest of all tree networks to the underlying distribution of the raw data. The criterion of "closeness" they used was the well-known Kullback-Leibler cross-entropy=-= measure (Kullback and Leibler, 1951-=-). The main restriction of this work was that it could only learn tree structures. Hence, if the raw data was the result of a non-tree structured distribution, the learned structure could be very inac... |

1523 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...ill capable of learning a complex network if no simpler network is sufficiently accurate. To make this tradeoff we use a well-studied formalism: Rissanen's Minimum Description Length (MDL) Principle (=-=Rissanen, 1978-=-). Besides the reasons given above, making a tradeoff between accuracy and usefulness seems to be particularly important when learning from raw data. The raw data is itself only an approximate picture... |

856 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ..., we can develop an approach to evaluating cross-entropy that uses local computation over low-order marginals. This approach is an extension of previous work due to Chow and Liu (1968). Chow and Liu (=-=Chow and Liu, 1968-=-) developed a method for finding a tree structure that minimized the cross-entropy, and their method was extended by Rebane and Pearl (1987) to finding polytrees with minimal cross-entropy. Theorem 3.... |

708 |
The computational complexity of probabilistic inference using Bayesian belief networks.
- Cooper
- 1990
(Show Context)
Citation Context ...ifficult to deal with. It is well known that in the worst case it is intractable to compute posterior probabilities in multiply-connected Bayesian networks; to be precise this computation is NP-Hard (=-=Cooper, 1990-=-). Furthermore, the time complexity of the known algorithms increases with the degree of connectivity of the network. For large multiply-connected networks approximation algorithms are often used, eit... |

594 |
Stochastic Complexity
- Rissanen
- 1989
(Show Context)
Citation Context ...as "X is independent of Y, given Z", Geiger et al. developed an approach [6] that can 1 Rissanen provides a lucid and convincing argument that discovering useful models is the real concern o=-=f science [14]-=-. discover a minimal-edge I-map[10]. However, their approach is again limited to polytrees; it is only guaranteed to work in the case where the underlying distribution has an exact polytree structure.... |

475 | Fusion, propagation and structuring in belief networks
- Pearl
- 1986
(Show Context)
Citation Context ...ralizes previous approaches based on Kullback cross-entropy. Experiments have been conducted to demonstrate the feasibility of the approach. 1 Introduction Bayesian belief networks, advanced by Pearl =-=[9]-=-, have become an important paradigm for representing and reasoning with uncertainty. Systems based on Bayesian networks have been constructed in a number of dierent application areas, ranging from me... |

315 | Bayesian updating in causal probabilistic networks by local computations - Jensen, Lauritzen, et al. - 1990 |

286 | Approximating probabilistic inference in Bayesian belief networks is NPhard
- Dagum, Luby
- 1993
(Show Context)
Citation Context ...ree of connectivity of the network. For large multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; Chavez, 1990; =-=Dagum and Chavez, 1991-=-; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; Peng and Reg... |

279 |
The alarm monitoring system: a case study with two probabilistic inference techniques for belief networks
- Beinlich, Suermondt, et al.
- 1989
(Show Context)
Citation Context ...t paradigm for representing and reasoning with uncertainty. Systems based on Bayesian networks have been constructed in a number of different application areas, ranging from medical diagnosis, e.g., (=-=Beinlich et al., 1989-=-), to reasoning about the oil market, e.g., (Abramson, 1991). Despite these successes, a major obstacle to using Bayesian networks lies in the difficulty of constructing them in complex domains. It ca... |

263 |
Equivalence and synthesis of causal models
- Verma, Pearl
- 1990
(Show Context)
Citation Context ...nnected networks, which topologically are directed acyclic graphs (dags). Recently, Spirtes et al. [16] have developed an algorithm that can construct multiply-connected networks. And Verma and Pearl =-=[17, 11]-=- have developed what they call an IC-Algorithm that can also recover these kinds of structures. However, both approaches require that the underlying distribution being learned be dagisomorphic. 2 But,... |

247 | A theory of inferred causation
- Pearl, Verma
- 1991
(Show Context)
Citation Context ...nnected networks, which topologically are directed acyclic graphs (dags). Recently, Spirtes et al. [16] have developed an algorithm that can construct multiply-connected networks. And Verma and Pearl =-=[17, 11]-=- have developed what they call an IC-Algorithm that can also recover these kinds of structures. However, both approaches require that the underlying distribution being learned be dag-isomorphic. 2 But... |

206 |
Propagation of uncertainty in Bayesian networks by probabilistic logic sampling
- Henrion
- 1988
(Show Context)
Citation Context ...multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; =-=Henrion, 1987-=-; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; Peng and Reggia, 1987a; Peng and Reggia, 1987b). ... |

177 |
Simulation approaches to general probabilistic inference on belief networks.
- Shachter, Peot
- 1990
(Show Context)
Citation Context ...approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; =-=Shachter and Peot, 1990-=-), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; Peng and Reggia, 1987a; Peng and Reggia, 1987b). In practice these algorithms allow one... |

157 | On the logic of causal models
- Geiger, Pearl
- 1988
(Show Context)
Citation Context ...t can discover a minimal-edge I-map. A network structure is an I-map of a probability distribution if every independence relation exhibited in the network holds also in the distribution (Pearl, 1988; =-=Geiger and Pearl, 1990-=-). However, their approach is again limited to polytrees; it is only guaranteed to work in the case where the underlying distribution has a polytree structure. All of the above approaches fail to reco... |

115 |
Weighting and integrating evidence for stochastic simulation in Bayesian networks
- Fung, Chag
- 1990
(Show Context)
Citation Context ...he network. For large multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; Chavez, 1990; Dagum and Chavez, 1991; =-=Fung and Chang, 1990-=-; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; Peng and Reggia, 1987a; Peng and R... |

109 |
Evidential reasoning using stochastic simulation of causal models
- Pearl
- 1987
(Show Context)
Citation Context ...ted networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; =-=Pearl, 1987-=-; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; Peng and Reggia, 1987a; Peng and Reggia, 1987b). In practice t... |

101 | Data Compression
- Lelewer, Hirschberg
(Show Context)
Citation Context ...ing distribution each atomic event e i has probability p i . Then Huffman's algorithm, when run using these probabilities, will assign event e i a codeword of length approximately \Gammalog 2 (p i ) (=-=Lelewer and Hirschberg, 1987-=-). When we have N data points, where N is large, we would expect that there will be Np i occurrences of event e i . Hence, the length of the string encoding the database will be approximately \Gamma N... |

83 |
A Bayesian method for constructing Bayesian belief networks from databases, in
- Cooper, Herskovits
- 1990
(Show Context)
Citation Context ... the function D respectively. One additional feature of our approach, in particular a feature of our heuristic search algorithm, is that we did not require a user supplied ordering of variables, cf. (=-=Cooper and Herskovits, 1991-=-). We feel that this experiment demonstrates that our approach is feasible for recovering Bayesian networks of practical size. In the third set of experiments, the original Bayesian network G6 consist... |

72 |
Counting unlabeled acyclic digraphs
- Robinson
- 1977
(Show Context)
Citation Context ...o the MDL principle. To perform thesrst part of the search, i.e., tosnd a network with low cross-entropy, we develop some additional results that are based on the work of Chow and Liu [3]. 6 Robinson =-=[15]-=- gives a recurrence that can be used to calculate this number. 7 4.1 Evaluating Cross-Entropy The underlying distribution P is a joint distribution over the variables X = fX 1 ; : : : ; X n g, and any... |

58 |
A probabilistic causal model for diagnostic problem solving*/Part II: diagnostic search
- Peng, Reggia
- 1987
(Show Context)
Citation Context ...Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Henrion, 1991; =-=Peng and Reggia, 1987-=-a; Peng and Reggia, 1987b). In practice these algorithms allow one to reason with more complex networks than can be handled by the exact algorithms. However, it has recently been shown that in general... |

55 |
On information and suciency
- Kullback, Leibler
- 1951
(Show Context)
Citation Context ...e network was the closest of all tree networks to the underlying distribution of the raw data. The criterion of \closeness" they used was based on the well-known Kullback-Leibler crossentropy measure =-=[7]-=-. The main restriction of this work was that it could only learn tree structures. Hence, if the raw data was the result of a non-tree structured distribution, the learned structure could be very inacc... |

51 |
The recovery of causal polytrees from statistical data. Uncertainty in Arti® cial Intelligence 3, edited by
- Rebane, Pearl
- 1989
(Show Context)
Citation Context ...f this work was that it could only learn tree structures. Hence, if the raw data was the result of a non-tree structured distribution, the learned structure could be very inaccurate. Rebane and Pearl =-=[12]-=- extended Chow and Liu's methods to the recovery of networks of singly connected trees (polytrees). If the underlying distribution had a polytree structure, its topological structure could be exactly ... |

40 |
Search-based methods to bound diagnostic probabilities in very large belief networks
- Henrion
- 1991
(Show Context)
Citation Context ...990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; =-=Henrion, 1991-=-; Peng and Reggia, 1987a; Peng and Reggia, 1987b). In practice these algorithms allow one to reason with more complex networks than can be handled by the exact algorithms. However, it has recently bee... |

30 | Learning causal trees from dependence information
- Geiger, Paz, et al.
- 1990
(Show Context)
Citation Context ...ion, the learned structure could be very inaccurate. Given a set of independence assertions of the form I(X;Z; Y ) interpreted as \X is independent of Y, given Z", Geiger et al. developed an approach =-=[6]-=- that can discover a minimal-edge I-map[10]. However, their approach is again limited to polytrees; it is only guaranteed to work in the case where the underlying distribution has an exact polytree st... |

28 | Causality from probability
- Spirtes, Glymour, et al.
- 1990
(Show Context)
Citation Context ...ucture. All of the above approaches fail to recover the richer and more realistic class of multiplyconnected networks, which topologically are directed acyclic graphs (dags). Recently, Spirtes et al. =-=[16]-=- have developed an algorithm that can construct multiply-connected networks. And Verma and Pearl [17, 11] have developed what they call an IC-Algorithm that can also recover these kinds of structures.... |

26 |
A randomized approximation algorithm for probabilistic inference on Bayesian belief networks
- Chavez, Cooper
- 1990
(Show Context)
Citation Context ...known algorithms increases with the degree of connectivity of the network. For large multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (=-=Chavez and Cooper, 1990-=-; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henr... |

23 |
On information and su ciency
- Kullback, Leibler
(Show Context)
Citation Context ...ned tree network was the closest of all tree networks to the underlying distribution of the raw data. The criterion of \closeness" they used was the well-known Kullback-Leibler cross-entropy measure (=-=Kullback and Leibler, 1951-=-). The main restriction of this work was that it could only learn tree structures. Hence, if the raw data was the result of a non-tree structured distribution, the learned structure could be very inac... |

17 | Introduction to Algorithms (MIT press - Leiserson, Rivest, et al. - 2001 |

13 |
ARCO1: An application of belief networks to the oil market
- Abramson
- 1991
(Show Context)
Citation Context ...s based on Bayesian networks have been constructed in a number of different application areas, ranging from medical diagnosis, e.g., (Beinlich et al., 1989), to reasoning about the oil market, e.g., (=-=Abramson, 1991-=-). Despite these successes, a major obstacle to using Bayesian networks lies in the difficulty of constructing them in complex domains. It can be a very time-consuming and error-prone task to specify ... |

11 |
NESTOR: A Computer-Based Medical Diagnosis that Integrates Causal and Probabilistic Knowledge
- Cooper
- 1984
(Show Context)
Citation Context ...z and Cooper, 1990; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (=-=Cooper, 1984-=-; Henrion, 1990; Henrion, 1991; Peng and Reggia, 1987a; Peng and Reggia, 1987b). In practice these algorithms allow one to reason with more complex networks than can be handled by the exact algorithms... |

8 |
The minimum description length principle and its application to online learning of handprinted characters
- Gao, Li
- 1989
(Show Context)
Citation Context ... a total ordering. 3 The MDL Principle In this section we will discuss in greater detail Rissanen's Minimal Description Length (MDL) principle, a well studied formalism in learning theory, see e.g., (=-=Gao and Li, 1989-=-; Rissanen, 1978). The MDL principle is based on the idea that the best model of a collection of data items is the model that minimizes the sum of 1. the length of the encoding of the model, and 2. th... |

5 |
Towards efficient inference in multiply connected belief networks
- Henrion
- 1990
(Show Context)
Citation Context ...1990; Chavez, 1990; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; =-=Henrion, 1990-=-; Henrion, 1991; Peng and Reggia, 1987a; Peng and Reggia, 1987b). In practice these algorithms allow one to reason with more complex networks than can be handled by the exact algorithms. However, it h... |

4 |
Propagating uncertainty inBayesian networks by probabilistic logic sampling
- Henrion
- 1988
(Show Context)
Citation Context ...multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990� Chavez, 1990� Dagum and Chavez, 1991� Fung and Chang, 1990� =-=Henrion, 1987-=-� Pearl, 1987� Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984� Henrion, 1990� Henrion, 1991� Peng and Reggia, 1987a� Peng and Reggia, 1987b). ... |

3 |
Architectures and approximation algorithms for probabilistic expert systems
- Chavez
- 1990
(Show Context)
Citation Context ...s with the degree of connectivity of the network. For large multiply-connected networks approximation algorithms are often used, either based on stochastic simulation, e.g., (Chavez and Cooper, 1990; =-=Chavez, 1990-=-; Dagum and Chavez, 1991; Fung and Chang, 1990; Henrion, 1987; Pearl, 1987; Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984; Henrion, 1990; Hen... |

2 |
Towards e cient inference in multiply connected belief networks
- Henrion
- 1990
(Show Context)
Citation Context ...1990� Chavez, 1990� Dagum and Chavez, 1991� Fung and Chang, 1990� Henrion, 1987� Pearl, 1987� Shachter and Peot, 1990), or search through the space of alternative instantiations, e.g., (Cooper, 1984� =-=Henrion, 1990-=-� Henrion, 1991� Peng and Reggia, 1987a� Peng and Reggia, 1987b). In practice these algorithms allow one to reason with more complex networks than can be handled by the exact algorithms. However, it h... |