| MacKay, D. J. C. Bayesian methods for backpropagation networks. Models of Neural Networks III, pages 211--254, 1994. |
....(9.83) 9.84) and (9.87) Waterhouse et al. 71] studied a Bayesian approach to the prediction of conditional probability densities with the HME model (see section 2.9. 5) The parameters of the model are inferred by ensemble learning and variational free energy minimisation, as outlined in [45] [47]. The approach is more general than the one discussed in this thesis in that the priors a k are chosen as x dependent functions. However, the resulting parameter update equations are essentially the same as those obtained in section 9.4.2, equation (9.83) 9.84) and (9.87) The additional terms ....
MacKay D.J.C. (1996): Bayesian Methods for Backpropagation Networks. In Domany E., van Hemmen J.L., Schulten K. (eds.): Models of Neural Networks III: Association, Generalization, and Representation, Springer-Verlag, New York, 211254.
....given in Section 3.4.4.2. 3.5.2 Comparison with the MML Method Note that if the negative logarithm of Equation (3.25) is taken then an expression similar to Equation (2.17) results: log P r(x log(2#) #) 1 log(det(M( #) L( #) 3. 26) MacKay therefore concludes that [88]: With care, therefore, one can replicate Bayesian results in MDL terms. Although some of the earliest work on complex model comparison involved the MDL framework [111] MDL has no apparent advantages over the direct probabilistic approach . Gull also states[57] A further note of interest ....
D. J. C. MacKay. Bayesian methods for backpropagation networks. In E. Domany, J. L. van Hemmen, and K. Schulten, editors, Models of Neural Networks III, chapter 6. Springer-Verlag, New York, 1994.
....points can become misclassi ed support vectors, thus misleading the heuristic. This may be the cause of the poorer performance of LAIKA exhibited on the Diabetes data. A further possibility is to allow a di erent scale parameter for each dimension, in order to perform automated feature selection [28]. Acknowledgements This research was undertaken within the Postgraduate Training Partnership established between Sira Ltd and University College London. Postgraduate Training Partnerships are a joint initiative of the Department of Trade and Industry (DTI) and the Engineering and Physical ....
Mackay, D.:Bayesian methods for backpropagation networks.In van Hemmen, J., Domany, E., Schulten, K., eds.: Models of Neural Networks II.Springer (1993)
....hidden units [27] The w l parameters in equation 4 allow a different length scale on each input dimension. For irrelevant inputs, the corresponding w l will become small, and the model will ignore that input. This is closely related to the Automatic Relevance Determination (ARD) idea of MacKay [10] and Neal [15] The v 0 variable specifies the overall scale of the prior. v 1 specifies the variance of a zero mean offset which has a Gaussian distribution. The Gaussian process framework allows quite a wide variety of priors over functions. For example, the Ornstein Uhlenbeck process (with ....
D. J. C. MacKay. Bayesian Methods for Backpropagation Networks. In J. L. van Hemmen, E. Domany, and K. Schulten, editors, Models of Neural Networks II. Springer, 1993.
....7 D. Fitting a model to an image To classify an image (I) each of the models (m) is fitted to the data and the model that best explains the image is chosen. If we assume a uniform prior over all digits, then the posterior probability, P (mjI) for each model is proportional to the evidence [35], P (Ijm) P (Ijm) Z P (Ijff; m)P (ffjm)dff (3) where ff is the vector of instantiation parameters for the model. To avoid confusion, we clarify that ff is the instantiation vector of the model in the image frame. In other words, ff is the concatenation: ff = X Y] T (4) where Y is the ....
D. J. C MacKay, "Bayesian methods for backpropagation networks", in Models of Neural Networks II, J. L. van Hemmen, E. Domany, and K. Schulten, Eds. 1993, Springer.
....that some input variables might be irrelevant to the prediction task at hand, and we would expect that the w s corresponding to the irrelevant variables would tend to zero as the model is fitted to data. This is closely related to the Automatic Relevance Determination (ARD) idea of MacKay and Neal [5], 7] 3 Experiments with Gaussian Process prediction Prediction with Gaussian processes and maximum likelihood training of the covariance function has been tested on two problems : i) a modified version of MacKay s robot arm problem and (ii) the Boston housing data set. For both datasets I ....
D. J. C. MacKay. Bayesian Methods for Backpropagation Networks. In J. L. van Hemmen, E. Domany, and K. Schulten, editors, Models of Neural Networks II. Springer, 1993.
....inference proceed, and howdowe handle uncertainty None of these questions is trivial. However, recent developments in Bayesian methods for neural networks have allowed disciplined treatments of many of these issues. In this paper, we adopt the Bayesian framework with Gaussian approximations #MacKay 1992abc, 1994#, which is discussed in some detail in the appendix. Here we highlight some of its basic ideas in connection with the questions raised above. The central feature of the Bayesian approach is to treat everything as probabilistic. Hence, instead of estimating the #true values of some #xed ....
....their posterior distribution in Equation 5 #or their asymptotic normal approximation#, insert them into the functional form in the second line of Equation 4 to compute # i , and take a random draw from a Bernoulli distribution with this parameter #the #rst line of Equation 4#. In practice, we used MacKay s #1994# analytical approximations to accomplish the same task. ....
MacKay, D.J.C. 1994. #Bayesian Methods for Backpropagation Networks." In E. Domany, J.L. van Hemmen and K.Schulten #Eds.#, Models of Neural Networks, III, Chapter 6. New York: Springer-Verlag.
....correlated outputs; the ff l parameters allow a different distance measure for each input dimension. For irrelevant inputs, the corresponding ff l will become small, and the model will ignore that input. This is closely related to the Automatic Relevance Determination (ARD) idea of MacKay and Neal (MacKay, 1993; Neal 1996) The v 0 variable gives the overall scale of the local correlations, a 0 and a 1 are variables controlling the scale of the bias and linear contributions to the covariance. A simple extension of the linear regression part of the covariance function would allow a different ....
MacKay, D. J. C. (1993). Bayesian Methods for Backpropagation Networks. In J. L. van Hemmen, E. Domany, and K. Schulten (Eds.), Models of Neural Networks II. Springer.
....inference proceed, and how do we handle uncertainty None of these questions is trivial. However, recent developments in Bayesian methods for neural networks have allowed disciplined treatments of many of these issues. In this paper, we adopt the Bayesian framework with Gaussian approximations (MacKay 1992abc, 1994), which is discussed in some detail in the appendix. Here we highlight some of its basic ideas in connection with the questions raised above. The central feature of the Bayesian approach is to treat everything as probabilistic. Hence, instead of estimating the true values of some fixed ....
.... posterior distribution in Equation 5 (or their asymptotic normal approximation) insert them into the functional form in the second line of Equation 4 to compute i , and take a random draw from a Bernoulli distribution with this parameter (the first line of Equation 4) In practice, we used MacKay s (1994) analytical approximations to accomplish the same task. ....
MacKay, D.J.C. 1994. "Bayesian Methods for Backpropagation Networks." In E. Domany, J.L. van Hemmen and K.Schulten (Eds.), Models of Neural Networks, III, Chapter 6. New York: Springer-Verlag.
....of hidden units [27] The w l parameters in equation 4 allow a different length scale on each input dimension. For irrelevant inputs, the corresponding w l will become small, and the model will ignore that input. This is closely related to the Automatic Relevance Determination (ARD) idea of MacKay [10] and Neal [15] The v 0 variable specifies the overall scale of the prior. v 1 specifies the variance of a zero mean offset which has a Gaussian distribution. The Gaussian process framework allows quite a wide variety of priors over functions. For example, the Ornstein Uhlenbeck process (with ....
D. J. C. MacKay. Bayesian Methods for Backpropagation Networks. In J. L. van Hemmen, E. Domany, and K. Schulten, editors, Models of Neural Networks II. Springer, 1993.
....Norman, OK 73019 1. INTRODUCTION The rather lofty title of this article may suggest that the research reported herein may be more fundamental than it really is The foundations of Bayesian techniques in neural networks have been already laid down in the works of Buntine and Weigend (1991) MacKay (1996), Neal (1996) and Wolpert (1993) Bishop (1996) devotes an entire chapter to the application of Bayesian techniques in neural networks. The present article aims to only illustrate some of the ideas developed in these works. All nonlinear regression and classification models can over fit data; ....
....through Bayesian reasoning. There are two distinct approaches in the implementation of Bayesian ideas in neural networks. Neal (1996) uses exact simulations, and advocates the point of view wherein a large number of hidden nodes are to be selected and then controlled through hyperparameters. MacKay (1996), on the other hand, approximates the posterior distributions thereby allowing for analytic results. In MacKay s approach, it appears that the optimal number of hidden nodes can be addressed by the so called evidence framework ; it has been conjectured that the maximum of the evidence ....
[Article contains additional citation context not shown here]
MacKay, D. J. C., 1996: Bayesian methods for back-propagation networks, in Models of Neural Networks III, E. Domany, J. L. van Hemmen, K. Schulten (Eds.), Springer-Verlag, New York, physics of neural network series, pp. 309.
....we should choose the simplest explanation that is consistent with the observed data [5] An efficient method to regularize the complexity of a network are regularization terms, e.g. weight decay. This type of regularization can also be motivated statistically within the Bayesian learning approach [2, 9]. Regularization techniques constraint the degree of freedom of our model, but do not solve the problem of determining the optimal size of the input for network, i.e. the problem of redundant and irrelevant information, that decreases the generalization performance. Our information theoretic ....
....beforehand manually in a series of training runs by increasing its value until overfitting does not occur anymore on a validation set which was split off from the training data, or it is adapted automatically during training within the Bayesian learning approaches. Making use of the MAP approach [9, 2], we can adapt from time to time during the learning process. Under the assumption that the weights have a Gaussian distribution with variance 1=ff and that the error has also a Gaussian distribution with variance 1=fi, one can adjust these two hyperparameters by maximizing the evidence, which is ....
D. MacKay. Bayesian methods for backpropagation networks. Models of Neural Networks III,, page Chapter 6, 1994.
....saliencies. 3.8.1 Pre analysis methods: Subset selection (Mardia, Kent, and Bibby 1979, section 6.7) Principal component regression: Mardia, Kent, and Bibby 1979, section 8.8) Massy 1965) 3.8.2 Post analysis methods Automatic relevance determination (Neal 1996, sections 1.2.3 and 4. 3) (MacKay 1994) (MacKay 1995) The delta test (Pi and Peterson 1994) Ohlsson, Peterson, Pi, R#gnvaldsson, and S#derberg 1994) ffi test iVaughnj rule 3.8.3 Saliency map I: OBD based saliency map Lars Kai Hansen (M#rch, Kjems, Hansen, Svarer, Law, Lautrup, Strother, and Rehm 1995) M#rch 1998) M#rch, Hansen, ....
MacKay, D. J. C. (1994). Bayesian methods for backpropagation networks. In E. Dormany, J. L. van Hemmen, and K. Schulten (Eds.), Models of Neural Networks III. New York: Springer-Verlag.
....6. 6 Comparison with the MML Method Note that if we take the negative logarithm of Equation (34) then we get an expression similar to Equation (23) Gamma log P rob(D j MC) Gamma d 2 log(2 ) Gamma log(P rob( j MC) 1 2 log(det(M ( L( 35) MacKay therefore concludes that [15]: With care, therefore, one can replicate Bayesian results in MDL terms. Although some of the earliest work on complex model comparison involved the MDL framework [19] MDL has no apparent advantages over the direct probabilistic approach . While Equations (34) and (23) are superficially similar, ....
D. J. C. MacKay. Bayesian methods for backpropagation networks. In E. Domany, J. L. van Hemmen, and K. Schulten, editors, Models of Neural Networks III, chapter 6. Springer-Verlag, New York, 1994.
....posterior distribution of w (2) by a Gaussian tted at the optimum w [1, 2] and by methods that represent the posterior distribution by a set of Monte Carlo samples from it [3] The former approach has been successfully applied to practical problems, as described elsewhere [4, 5] See ref. [6] for a review. In the general case of a classi cation problem with multiple classes i = 1 : I, a softmax classi er is a natural form of model. This assigns probabilities to the alternative classes i thus: P (t = ijx; w;H) y i (x; w) e a i (x;w) P i 0 e a i 0 (x;w) 6) where fa ....
....the use of the recognition mapping to compute an approximate distribution over the latent variables that is used to train an autoencoder by an elegant free energy minimization method [18] The connection between the MDL approach that they use and the Bayesian viewpoint is explained in ref. [6]. The main di erences between this work and Hinton and Zemel s are the types of network studied, and the inclusion in this work of an additional level in the hierarchical model (the hyperparameters f c g) so that it can discover for itself the appropriate dimensionality of the latent space. ....
D. J. C. MacKay. Bayesian methods for backpropagation networks. In E. Domany, J. L. van Hemmen, and K. Schulten, editors, Models of Neural Networks III, chapter 6. Springer-Verlag, New York, 1994.
....direction over which y is expected to vary significantly. A very large length scale means that the y is expected to be essentially a constant function of that input. Such an input could be said to be irrelevant, as in the automatic relevance determination (ARD) method for neural networks (MacKay 1994, Neal 1996) The 1 hyperparameter defines the vertical scale of variations of a typical function. The 2 hyperparameter allows the whole function to be offset away from zero by some unknown constant to understand this term, examine equation (24) and consider the basis function OE(x) 1. ....
MacKay, D. J. C.: 1994, Bayesian methods for backpropagation networks, in E. Domany, J. L. van Hemmen and K. Schulten (eds), Models of Neural Networks III, SpringerVerlag, New York, chapter 6, pp. 211--254.
No context found.
MacKay, D. J. C. Bayesian methods for backpropagation networks. Models of Neural Networks III, pages 211--254, 1994.
No context found.
D. J. C. MacKay. Bayesian methods for backpropagation networks. In J. L. van Hemmen, E. Domany, and K. Schulten, editors, Models of Neural Networks III, pages 211--254, New York, 1994.
No context found.
MacKay, D. J. C. (1994). Bayesian methods for backpropagation networks. Models of Neural Networks III, 211-- 254.
No context found.
D. J. C. MacKay. Bayesian methods for backpropagation networks. In E. Domany, J. L. van Hemmen, and K. Schulten, editors, Models of Neural Networks III, chapter 6. Springer-Verlag, New York, 1994.
No context found.
D. J. C. Mackay. Bayesian methods for backpropagation networks. In E. Domany, J. L. van Hemmen, and K. Schulten, editors, Models of Neural Networks III, chapter 6. Springer-Verlag, 1994.
No context found.
MacKay, D. J. C., 1996: Bayesian methods for back-propagation networks, in Models of Neural Networks III, E. Domany, J. L. van Hemmen, K. Schulten (Eds.), Springer-Verlag, New York, physics of neural network series, pp. 309.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC