We examine Bayesian methods for learning Bayesian networks from a combination of prior knowledge and statistical data. In particular, we develop simple methods for generating priors for Bayesian-network parameters. Our work is a generalization of previous work that has concentrated on Bayesian networks containing only discrete variables and (to a lesser extent) on Gaussian networks. We introduce three assumptions that are abstractions of previously made assumptions: likelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence, parameter independence, which says that the parameters associated with each node in a Bayesian network are independent, and parameter modularity, which says that if a node has the same parents in two distinct networks, then the probability density functions of the parameters associated with this node are identical in both networks. We show how these assumptions greatly simplify the construction of priors. In addition, we use these assumptions to derive a general metric for complete databases. Combining this general metric with well-known statistical facts about the Dirichlet and normal--Wishart distribution, we provide simple derivations of metrics for discrete and Gaussian networks, respectively. Finally, we show how our assumptions lead to a general framework for characterizing prior distributions for the parameters of multivariate sampling.
|
4714
|
Probabilistic Reasoning in intelligent systems: networks of plausible inference
– Pearl
- 1988
|
|
4704
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
2490
|
Stochastic relaxation, gibbs distributions, and the bayesian restoration of images
– Geman, Geman
- 1984
|
|
1228
|
Equation of state calculations by fast computing machines
– Metropolis, Rosenbluth, et al.
- 1953
|
|
966
|
Practical Optimization
– Gill, Murray, et al.
- 1981
|
|
960
|
Local computations with probabilities on graphical structures and their application to expert systems
– Lauritzen, Spiegelhalter
- 1988
|
|
940
|
Estimating the Dimension of a Model
– Schwarz
- 1978
|
|
726
|
A Bayesian method for the induction of probabilistic networks from data
– Cooper, Herskovits
- 1992
|
|
694
|
A new look at the statistical model identification
– Akaike
- 1974
|
|
638
|
Learning Bayesian networks: The combination of knowledge and statistical data
– Heckerman, Geiger, et al.
- 1995
|
|
613
|
Monte Carlo sampling methods using Markov chains and their applications
– Hastings
- 1970
|
|
607
|
The Foundations of Statistics
– Savage
- 1954
|
|
449
|
Judgment Under Uncertainty: Heuristics and Biases
– Kahneman, Slovic, et al.
- 1982
|
|
441
|
The computational complexity of probabilistic inference using Bayesian belief networks
– Cooper
- 1990
|
|
439
|
Theory of Games and Economic Behavior
– Neumann, Morgenstern
- 1944
|
|
377
|
Bayesian classification (AUTOCLASS): Theory and results
– Cheeseman, Stutz
- 1996
|
|
353
|
Probabilistic inference using Markov chain Monte Carlo methods
– Neal
- 1993
|
|
337
|
Optimal Statistical Decisions
– DEGROOT
- 1970
|
|
296
|
Judgment under uncertainty: Heuristics and biases
– Tversky, Kahneman
- 1974
|
|
292
|
Evaluating influence diagrams
– Shachter
- 1986
|
|
255
|
Influence diagrams
– Howard, Matheson
- 1984
|
|
255
|
Causation, Prediction, and Search
– Spirtes, Glymour, et al.
- 1993
|
|
254
|
Fusion, propagation, and structuring in belief networks
– Pearl
- 1986
|
|
198
|
An essay towards solving a problem in the doctrine of chances
– Bayes
- 1763
|
|
189
|
Operations for Learning with Graphical Models
– Buntine
- 1994
|
|
188
|
Bayesian networks without tears
– Charniak
- 1991
|
|
183
|
The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks
– Beinlich, Suermondt, et al.
- 1989
|
|
182
|
Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window
– Madigan, Raftery
- 1994
|
|
173
|
A theory of inferred causation
– Pearl, Verma
- 1991
|
|
172
|
Bayesian analysis in expert systems
– Spiegelhalter, Dawid, et al.
- 1993
|
|
166
|
Sequential updating of conditional probabilities on directed graphical structures
– Spiegelhalter, Lauritzen
- 1990
|
|
160
|
How easy is local search
– Johnson, Papadimitriou, et al.
- 1988
|
|
142
|
Lectures on functional equations and their applications
– Aczel
- 1966
|
|
140
|
Equivalence and synthesis of causal models
– Verma, Pearl
- 1990
|
|
139
|
Theory refinement on Bayesian networks
– Buntine
- 1991
|
|
117
|
Linear-space best-first search
– Korf
- 1993
|
|
109
|
An algebra of Bayesian belief universes for knowledge-based systems
– Jensen, Olesen, et al.
- 1990
|
|
99
|
Causal diagrams for empirical research
– Pearl
- 1995
|
|
93
|
Optimum branchings
– Edmonds
- 1967
|
|
91
|
Probabilistic inference and influence diagrams
– Shachter
- 1988
|
|
90
|
Probability, frequency and reasonable expectation
– Cox
- 1946
|
|
81
|
Hyper Markov laws in the statistical analysis of decomposable graphical models
– Dawid, Lauritzen
- 1993
|
|
74
|
Learning Gaussian networks
– Geiger, Heckerman
- 1994
|
|
70
|
A new approach to causal inference in mortality studies with a sustained exposure period -- applications to control of the healthy workers survivor effect
– Robins
- 1986
|
|
67
|
A transformational characterization of equivalent Bayesian network structures
– Chickering
- 1995
|
|
67
|
The chain graph Markov property
– Frydenberg
- 1990
|
|
65
|
Theory of Probability
– Finetti
- 1974
|
|
62
|
Bayes Factors and model uncertainty
– Kass, Raftery
- 1993
|
|
61
|
BUGS: A program to perform Bayesian inference using Gibbs sampling
– Thomas, Spiegelhalter, et al.
- 1992
|
|
60
|
Probability and the Weighing of Evidence
– Good
- 1950
|