#### DMCA

## Bayesian Interpolation (1991)

### Cached

### Download Links

- [www.funet.fi]
- [www.cs.utoronto.ca]
- [authors.library.caltech.edu]
- CiteULike

### Other Repositories/Bibliography

Venue: | NEURAL COMPUTATION |

Citations: | 728 - 17 self |

### Citations

4324 |
Estimating the Dimension of a Model
- Schwarz
- 1978
(Show Context)
Citation Context ... this Bayesian framework. The log evidence log 2 P (DjH i ) is the number of bits in the ideal shortest message that encodes the data D using model H i . Akaike's criteria are an approximation to MDL =-=[16, 24, 25]-=-. Any implementation of MDL necessitates approximations in evaluating the length of the ideal shortest message. I can see no advantage in MDL, and recommend that the evidence should be approximated di... |

1887 |
Statistical Decision Theory and Bayesian Analysis
- Berger
- 1985
(Show Context)
Citation Context ... review of Bayesian philosophy see the excellent papers by Loredo and Jaynes [10, 12]. Since Jeffreys the emphasis of most Bayesian probability theory has been `to formally utilize prior information' =-=[1]-=-, i.e. to perform inference in a way that makes explicit the prior knowledge and ignorance that we have, which orthodox methods omit. However, Jeffreys' work also laid the foundation for Bayesian mode... |

1565 |
Modeling by shortest data description,”
- Rissanen
- 1978
(Show Context)
Citation Context ...equation (6) should be multiplied by the degeneracy of wMP to give the correct estimate of the evidence. ffl `Minimum description length' (MDL) methods are closely related to this Bayesian framework (=-=Rissanen, 1978-=-, Wallace and Boulton, 1968, Wallace and Freeman, 1987). The log evidence log 2 P (DjH i ) is the number of bits in the ideal shortest message that encodes the data D using model H i . Akaike's (1970)... |

881 |
Theory of Probability.
- Jeffreys
- 1984
(Show Context)
Citation Context ... Complex hypotheses are automatically self--penalising under Bayes' rule. Figure 2 gives the basic intuition for why this should be expected. Bayesian methods were first laid out in depth by Jeffreys =-=[11]-=-. For a general review of Bayesian philosophy see the excellent papers by Loredo and Jaynes [10, 12]. Since Jeffreys the emphasis of most Bayesian probability theory has been `to formally utilize prio... |

746 | Bayesian Inference in Statistical Analysis. - Box, Tiao - 1992 |

494 | A practical Bayesian framework for backpropagation networks.
- MacKay
- 1992
(Show Context)
Citation Context ...ods will prove an ever more important tool for refining our modelling abilities. I hope that this review will help to introduce these techniques to the `neural' modelling community. A companion paper =-=[13]-=- will demonstrate how these techniques can be applied to backpropagation neural networks. 2 The evidence and the Occam factor Let us write down Bayes' rule for the two levels of inference described ab... |

428 | Information-based objective functions for active data selection.
- MacKay
- 1992
(Show Context)
Citation Context ...ds will prove an ever more important tool for refining our modelling abilities. I hope that this review will help to introduce these techniques to the `neural' modelling community. A companion paper (=-=MacKay, 1991-=-b) will demonstrate how these techniques can be fruitfully applied to backpropagation neural networks. Another paper will show how this framework relates to the task of selecting where next to gather ... |

380 |
D.M.: An information measure for classification.
- Wallace, Boulton
- 1968
(Show Context)
Citation Context ... this Bayesian framework. The log evidence log 2 P (DjH i ) is the number of bits in the ideal shortest message that encodes the data D using model H i . Akaike's criteria are an approximation to MDL =-=[16, 24, 25]-=-. Any implementation of MDL necessitates approximations in evaluating the length of the ideal shortest message. I can see no advantage in MDL, and recommend that the evidence should be approximated di... |

339 |
Spline Smoothing and Nonparametric Regression.
- EUBANK
- 1988
(Show Context)
Citation Context ...s incorrect and how Bayes sets these parameters. The use of test data may be an unreliable technique unless large quantities of data are available. Cross--validation, the orthodox `method of choice' (=-=Eubank, 1988-=-), will be discussed more in section 6 and (MacKay, 1991b). I will explain the Bayesian method of inferring ff and fi after first reviewing some statistics of misfit. Misfit,s2 , and the effect of par... |

291 | Computational vision and regularization theory. - Poggio, Torre, et al. - 1985 |

228 | Probability, frequency and reasonable expectation. - Cox - 1946 |

220 |
Estimation and inference by compact coding.
- Wallace, Freeman
- 1987
(Show Context)
Citation Context ... this Bayesian framework. The log evidence log 2 P (DjH i ) is the number of bits in the ideal shortest message that encodes the data D using model H i . Akaike's criteria are an approximation to MDL =-=[16, 24, 25]-=-. Any implementation of MDL necessitates approximations in evaluating the length of the ideal shortest message. I can see no advantage in MDL, and recommend that the evidence should be approximated di... |

204 | Bayesian Modeling of Uncertainty in Low Level Vision.
- Szeliski
- 1989
(Show Context)
Citation Context ...and Skilling [5, 6, 8, 17, 18], who have used Bayesian methods to achieve the state of the art in image reconstruction. The same approach to regularisation has also been developed in part by Szeliski =-=[22]-=-. Bayesian model comparison is also discussed by Bretthorst [2], who has used Bayesian methods to push back the limits of NMR signal detection. As the quantities of data collected throughout science a... |

200 | Statistical predictor identification. - Akaike - 1969 |

128 |
Computational Vision and Regularization Theory," Nature London 317,
- Poggio, Torre, et al.
- 1985
(Show Context)
Citation Context ...the most probable interpolant, wMP . Error bars on the best fit interpolant can be obtained from the hessian of M , A = rrM , evaluated at wMP . This is the well known Bayesian view of regularisation =-=[15, 23]-=-. Bayes can do a lot more than just provide an interpretation for regularisation. What we have described so far is just the first of three levels of inference. (The second level of model comparison de... |

122 | Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension.
- Haussler, Kearns, et al.
- 1994
(Show Context)
Citation Context ...theses. If a poor regulariser is used, for example, one that is ill--matched to the statistics of the world, then the Bayesian choice of ff will often not be the best in terms of generalisation error =-=[3, 6, 9]-=-. Such a failure occurs in the companion paper on neural networks. What is our attitude to such a failure of Bayesian prediction? The failure of the evidence does not mean that we should discard the e... |

103 |
The axioms of maximum entropy.
- Skilling
- 1988
(Show Context)
Citation Context ...sian model comparison, `regularisation, ' and noise estimation, by studying the problem of interpolating noisy data. The Bayesian framework I will describe for these tasks is due to Gull and Skilling =-=[5, 6, 8, 17, 18]-=-, who have used Bayesian methods to achieve the state of the art in image reconstruction. The same approach to regularisation has also been developed in part by Szeliski [22]. Bayesian model compariso... |

92 |
Bayesian inductive inference and maximum entropy. In
- Gull
- 1988
(Show Context)
Citation Context ...arameterised models. `Occam's razor' is the principle that unnecessarily complex models should not be preferred to simpler ones. Bayesian methods automatically and quantitatively embody Occam's razor =-=[5]-=-, without the introduction of ad hoc penalty terms. Complex hypotheses are automatically self--penalising under Bayes' rule. Figure 2 gives the basic intuition for why this should be expected. Bayesia... |

89 |
Developments in maximum entropy data analysis,”
- Gull
- 1989
(Show Context)
Citation Context ...sian model comparison, `regularisation, ' and noise estimation, by studying the problem of interpolating noisy data. The Bayesian framework I will describe for these tasks is due to Gull and Skilling =-=[5, 6, 8, 17, 18]-=-, who have used Bayesian methods to achieve the state of the art in image reconstruction. The same approach to regularisation has also been developed in part by Szeliski [22]. Bayesian model compariso... |

67 | From Laplace to supernova SN 1987A: Bayesian inference in astrophysics,’’ in Maximum Entropy and Bayesian Methods, edited by P.
- Loredo
- 1990
(Show Context)
Citation Context ...ic intuition for why this should be expected. Bayesian methods were first laid out in depth by Jeffreys [11]. For a general review of Bayesian philosophy see the excellent papers by Loredo and Jaynes =-=[10, 12]-=-. Since Jeffreys the emphasis of most Bayesian probability theory has been `to formally utilize prior information' [1], i.e. to perform inference in a way that makes explicit the prior knowledge and i... |

53 | Bayesian classification theory - Hanson, Stutz, et al. - 1991 |

52 |
Common structure of smoothing techniques in statistics.
- Titterington
- 1985
(Show Context)
Citation Context ...the most probable interpolant, wMP . Error bars on the best fit interpolant can be obtained from the hessian of M , A = rrM , evaluated at wMP . This is the well known Bayesian view of regularisation =-=[15, 23]-=-. Bayes can do a lot more than just provide an interpretation for regularisation. What we have described so far is just the first of three levels of inference. (The second level of model comparison de... |

49 |
On the asymptotic behaviour of posterior distributions.
- Walker
- 1969
(Show Context)
Citation Context ...ribution, and the gaussian approximation is exact. For more general statistical models we still expect the posterior to be dominated by locally gaussian peaks on account of the central limit theorem (=-=Walker, 1967-=-). Multiple maxima which arise in more complex models complicate the analysis, but Bayesian methods can still successfully be applied (Hanson et. al., 1991, MacKay, 1991b, Neal, 1991). 5 Thing is negl... |

44 | Bayesian methods: General background. In
- Jaynes
- 1986
(Show Context)
Citation Context ...ic intuition for why this should be expected. Bayesian methods were first laid out in depth by Jeffreys [11]. For a general review of Bayesian philosophy see the excellent papers by Loredo and Jaynes =-=[10, 12]-=-. Since Jeffreys the emphasis of most Bayesian probability theory has been `to formally utilize prior information' [1], i.e. to perform inference in a way that makes explicit the prior knowledge and i... |

35 | Bayesian mixture modeling by monte carlo simulation.
- Neal
- 1991
(Show Context)
Citation Context ...ion, and the gaussian approximation is exact; but this is of course not the case for a general problem. Multiple maxima complicate the analysis, but Bayesian methods can still successfully be applied =-=[13, 14]-=-. 2. Model comparison. At the second level of inference, we wish to infer which model is most plausible given the data. The posterior probability of each model is: P (H i jD) / P (DjH i )P (H i ) (3) ... |

21 | Bayesian Analysis. I. Parameter Estimation Using Quadrature
- Bretthorst
- 1990
(Show Context)
Citation Context ...o achieve the state of the art in image reconstruction. The same approach to regularisation has also been developed in part by Szeliski [22]. Bayesian model comparison is also discussed by Bretthorst =-=[2]-=-, who has used Bayesian methods to push back the limits of NMR signal detection. As the quantities of data collected throughout science and engineering continue to increase, and the computational powe... |

19 | Probabilistic displays - Skilling, Robinson, et al. - 1991 |

16 | A Bayesian Comparison of Different Classes of Dynamic Models Using Empirical Data. - Kashyap - 1977 |

15 |
Information, weight of evidence: The singularity between probability measures and signal detection
- Good, Osteyee
- 1974
(Show Context)
Citation Context ...ated cannot be systematic. To be precise, the expectation over possible data sets of the log evidence for the true model is greater than the expectation of the log evidence for any other fixed model (=-=Osteyee and Good, 1974-=-). 14 Proof. Suppose that the truth is actually H 1 . A single data set arrives and we compare the evidences for H 1 and H 2 , a different fixed model. Both models may have free parameters, but this w... |

15 |
Laplace’s 1774 memoir on inverse probability
- Stigler
- 1986
(Show Context)
Citation Context ...of maximum likelihood. 10 Since ff and fi are scale parameters, this prior should be understood as a flat prior over log ff and log fi. 11 It is remarkable that Laplace almost got this right in 1774 (=-=Stigler, 1986-=-); when inferring the mean of a Laplacian distribution, he both inferred the posterior probability of a nuisance parameter like fi in (15), and then attempted to integrate out the nuisance parameter a... |

11 | Bayesian data analysis: Straight-line fitting - Gull - 1989 |

11 |
The eigenvalues of mega-dimensional matrices
- Skilling
- 1989
(Show Context)
Citation Context ...and replace the evaluation of det A by the evaluation of TraceA 01 . For large dimensional problems where this task is demanding, Skilling has developed methods for estimating TraceA 01 statistically =-=[21]-=-. 5 Bayesian model comparison To rank alternative basis sets A and regularisers (priors) R in the light of the data, we examine the posterior probabilities: P (A; RjD) / P (DjA; R)P (A; R) (25) The da... |

9 |
Stone circle geometries: an information theory approach
- Patrick, Wallace
- 1982
(Show Context)
Citation Context ...ny implementation of MDL necessitates approximations in evaluating the length of the ideal shortest message. Although some of the earliest work on complex model comparison involved the MDL framework (=-=Patrick and Wallace, 1982-=-), I can see no advantage in MDL, and recommend that the evidence should be approximated directly. ffl It should be emphasised that the Occam factor has nothing to do with how computationally complex ... |

8 |
Applications of maximum entropy techniques to HST data
- Weir
- 1991
(Show Context)
Citation Context ...was being used; this motivated an immediate search for alternative priors; the new, more probable priors discovered by this search are now at the heart of the state of the art in image deconvolution (=-=Weir, 1991-=-). The similarity between regularisation and `early stopping' While an over--parameterised model is fitted to a data set using gradient descent on the data error, it is sometimes noted that the model'... |

6 | Quantified maximum entropy: MemSys 5 Users - Gull, Skilling - 1999 |

6 |
Bayesian methods: General background,” in Maximum Entropy and Bayesian Methods in
- Jaynes
- 1986
(Show Context)
Citation Context ...in which alternative regularisers are compared, for example. If we try one model and obtain awful predictions, we have learnt something. `A failure of Bayesian prediction is an opportunity to learn' (=-=Jaynes, 1986-=-), and we are able to come back to the same data set with new models, using new priors for example. Evaluating the evidence Let us now explicitly study the evidence to gain insight into how the Bayesi... |

5 |
Optimization in the regularization of ill-posed problems
- Davies, Anderssen
- 1986
(Show Context)
Citation Context ...theses. If a poor regulariser is used, for example, one that is ill--matched to the statistics of the world, then the Bayesian choice of ff will often not be the best in terms of generalisation error =-=[3, 6, 9]-=-. Such a failure occurs in the companion paper on neural networks. What is our attitude to such a failure of Bayesian prediction? The failure of the evidence does not mean that we should discard the e... |

2 |
Bayesian interpolation
- Sibisi
- 1991
(Show Context)
Citation Context ...sian model comparison, `regularisation, ' and noise estimation, by studying the problem of interpolating noisy data. The Bayesian framework I will describe for these tasks is due to Gull and Skilling =-=[5, 6, 8, 17, 18]-=-, who have used Bayesian methods to achieve the state of the art in image reconstruction. The same approach to regularisation has also been developed in part by Szeliski [22]. Bayesian model compariso... |

2 |
On parameter estimation and quantified
- Skilling
- 1991
(Show Context)
Citation Context ...nclusion of priors into inference, as is widely held. There is not one significant `subjective prior' in this entire paper. (If you are interested to see problems where subjective priors do arise see =-=[7, 20]-=-.) The emphasis is that degrees of preference for alternative hypotheses are represented by probabilities, and relative preferences for hypotheses are assigned by evaluating those probabilities. Histo... |

2 | Probability, frequency, and reasonable expectation', Am - Cox - 1964 |

2 | On parameter estimation and quantified MaxEnt - Skilling - 1991 |

1 | Quantifying drug absorption - Charter - 1991 |

1 | and P.Cheeseman - Hanson, Stutz - 1991 |

1 | Bayesian Modeling of Uncertainty - Interpolation - 1989 |