| D. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):448--472, 1992. |
....the networks are returned to the state they had before the random move; rather the temperature is used in the selection between many such updated models. III. PROPERTIES OF THE METHOD The properties and performance of the algorithm were evaluated using three test sets. The Robot Arm data set [22], 18] is an artificial regression problem, in which the outputs are simple trigonometric functions of the inputs, with a small Gaussian noise term added. The Pima Indians Diabetes Database [23] and Myocardial Scintigram data set [24] are medical classification problems with binary targets. The ....
D. J. C. MacKay, "A practical Bayesian framework for backpropagation networks," Neural Computation, vol. 4, no. 3, pp. 448--472, 1992.
....expression on the right is called the evidence for model M i , and can be used as a criterion for model selection. Section 9.2 will briefly review the Bayesian evidence scheme. This concept was originally developed by Gull [23] and was introduced to the neural network community by MacKay [42] [43], who applied it to the prediction of Gaussian conditional densities P (yjx) exp ( Gamma ) with constant precisions, fi = constant. It will be outlined, without going into mathematical detail, how this scheme can be generalised to arbitrary probability densities P (yjx) The mathematical ....
....in the constant Omega Omega s , to replace det A by fdet Ag :r . This leads to the following corrected expression for the evidence (9.31) ln P (DjM) GammaE(q) Gamma R(qjr) Gamma 2 2 lnfdet Ag :r 2 (9.53) The whole concept is illustrated in a simple example. MacKay [42] [43] considers two hyperparameters r = ff; fi) and shows that A ffff = Gamma (9.54) A fifi = Gamma (9.55) A fffi = Gamma ff fi 0 (9.56) where N denotes the size of the training set and fl the number of free parameters (to be discussed shortly) The resulting ....
[Article contains additional citation context not shown here]
MacKay D.J.C. (1992): A practical Bayesian framework for backpropagation networks. Neural Computation 4, 448-472.
....by the lengths of the two limbs, together with the angle of rotation in the first joint of the first limb, and the angle of rotation in the second joint of the second limb with respect to the first limb. The mobile end of the arm thus has two degrees of freedom given by the two angles. In [12] this standard problem is modeled using a Bayesian framework to obtain a feed forward network model. We use MDL to obtain the best number of nodes in the hidden layer of a three layer feed forward network model. The method is essentially the same as in the character recognition experiment. Just as ....
....limb (the hand so to speak) and the variables # 1 ,# 2 is given by r 1 cos(# 1 ) r 2 cos(# 1 # 2 ) r 1 sin(# 1 ) r 2 sin(# 1 # 2 ) The goal is to construct a feedforward neural network that correctly associates the (y 1 ,y 2 ) coordinates to the (# 1 ,# 2 ) coordinates. As in [12] we set r 1 2andr 2 1.3. The setup is similar to the character recognition experiment except that the data are not real world but computer generated. We generated random examples of the relation between y 1 ,y 2 and # 1 ,# 2 as in the above formula and a little Gaussian noise was added to ....
[Article contains additional citation context not shown here]
D.J.C. MacKay, A practical Bayesian framework for backpropagation networks, Neural Computation 4 (3) (1992) 448--472.
....Unfortunately, all usefull quantities (e.g. predictions) take the form of a multidimensional integral, approximation of which is extremely dicult. In the recent past, various authors have attempted to approximate these integrals by using various techniques, among which Gaussian approximations [4] and hybrid Markov Chain Monte Carlo (MCMC) 5] See Section 3 for a few more words on the latter technique. In this paper, which is partly based on a chapter of [7] we will attempt to approximate the integrals by using so called cubature formulae combined with a suitable transformation. A ....
....importance of the problem domain. 2 The problem: expectations in Bayesian neural networks In this section we brie y consider Bayesian inference applied to neural networks. The reader requiring more information on the subject is referred to [1] We restrict ourselves to regression problems. See [1, 4] for a description of Bayesian neural networks for classi cation. The neural network type concerned is the well known multilayer perceptron (MLP) The MLP related notation that we will use in this paper is summarized in Table 1. For notational convenience we will only consider regression problems ....
[Article contains additional citation context not shown here]
D. J. C. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):448-472, 1992.
....distribution for all observable and unobservable quantities in the problem. 2. Conditioning on observed data: calculating and interpreting the appropriate posterior distribution given the data. 3. Evaluating the model fit: assessing the implications of the posterior distribution. MacKay [14] and Neal [16] describe a Bayesian neural network (BNN) where a probabilistic interpretation is applied to the NN technique. This interpretation involves assigning a meaning to the functions and parameters already in use. In the Bayesian approach to NN prediction, the objective is to use the ....
....the objective evaluation of a number of issues involved in complex modelling including the choice between alternative network architectures (e.g. the number of hidden units and the activation function) the stopping rules for network training and the effective number of parameters used. MacKay [14] postulates that the overall effect of the Bayesian framework should be realised in the reduction in the high cost of the learning process in terms of the time needed for the learning to take place. The framework allows for the full use of the limited and often expensive data set for training the ....
[Article contains additional citation context not shown here]
D.J.C. MacKay. A practical bayesian framework for backpropagation networks. Neural Computation, 4:448--472, 1992.
....determine the values of these parameters. Moreover, Bayesian methods can also provide probabilistic class prediction that is more desirable than just deterministic classification. There is some literature on Bayesian interpretations of classical SVC. Kwok [3] built up MacKay s evidence framework [4] using a weight space interpretation. The unnormalized evidence may cause inaccuracy in Bayesian inference. Sollich [6] pointed out that the normalization issue in Wei Chu gratefully acknowledges the financial support provided by the National University of Singapore through Research Scholarship. ....
....idea about the suitable values of # before training data are available, we assume a flat distribution for P(#) i.e. is greatly insensitive to the values of #. Therefore, P(D #) known as the evidence of #, can be used to assign a preference to alternative values of the hyperparameters # [4]. The evidence could be calculated by an explicit formula after using a Laplacian approximation at f MP , and then hyperparameter inference may be done by gradient based optimization methods. We can get the evidence by an integral over all f : P(D #, f)P(f #) df . Using the definitions ....
D. J. C. MacKay, A practical Bayesian framework for back propagation networks. Neural Computation, 4(3), 448-472, 1992.
....MAP gives biased results (e.g. noise variance being systematically under estimated) and requires large number of samples in order to get good results. These limitations can be overcome in a Bayesian treatment. Bayesian neural networks with independent output noises have been discussed by MacKay [8] and Neal [9, 10] Purpose of this paper is to show how this problem can be solved using full covariance matrix with Bayesian treatment and Markov Chain Monte Carlo (MCMC) methods. We begin by briefly reviewing the Bayesian neural networks and MCMC implementation in Sections 2 and 3. In Section 4 ....
MacKay, D. J. C. (1992). A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):pp. 448472.
....is to account for effect of gradient on the total energy. The new algorithm is amenable to the same weight perturbationand decay framework, and the resulting system has better empirical performance than that of pure backpropagation and Gaussian approximation of learning by backpropagation. [10] In principle, annealing with Hybrid Monte Carlo simulation facilitates arbitrarily close approximation (as opposed to requiring approximator components to be drawn from a family of Gaussian conditional distributions, cf. MacKay) 10] The main benefit of the general Bayesian framework is its ....
....and Gaussian approximation of learning by backpropagation. 10] In principle, annealing with Hybrid Monte Carlo simulation facilitates arbitrarily close approximation (as opposed to requiring approximator components to be drawn from a family of Gaussian conditional distributions, cf. MacKay) [10] The main benefit of the general Bayesian framework is its avoidance of overfitting through regularized weight decay. Finally, the further improvement given by Gibbs sampling demonstrates the benefits of a normalized information theoretic measure of free energy (versus the traditional entropy ....
D. J. C. MacKay. A practical bayesian framework for backpropagation networks. Neural Comp., 4:448--472, 1992.
....algorithm, that is based on maximum likelihood algorithm, since the neural net, with a huge number of node, tends to overfit the data. Neal investigated the net behavior when the number of hidden nodes goes to infinity, and showed that it can get good performances using the Bayesian learning [31], instead of maximum likelihood strategy. In the Bayesian approach to neural networks a prior distribution over the weights induces a prior distribution over functions. This prior is combined with a noise model, which specifies the probability of observing the targets t given function values y, to ....
D.J.C.MacKay, "A Practical Bayesian Framework for backpropagation Networks ", Neural Computation, 4(3), pp. 448-472, 1992
....(concrete quality estimation and forest scene classification) 3. 1 MLP and GP models Both MLP and GP are flexible nonlinear models, where the available number of parameters p in model may be near or even greater than the number of data samples n and also the effective number of parameters p eff (MacKay, 1992; Spiegelhalter et al. 1998) is usually large compared to n. We used one hidden layer MLP with tanh hidden units, which in matrix format can be written as f (x,# w ) b 2 w 2 tanh (b 1 w 1 x) 30) The # w denotes all the parameters w 1 , b 1 , w 2 , b 2 , which are the hidden layer ....
....Assessment and Comparison Using Cross Validation Predictive Densities 13 3.2 Toy problem: MacKay s robot arm In this section we illustrate some basic issues of the expected utilities computed by using the cross validation predictive densities. Very simple robot arm toy problem (first used in MacKay, 1992) was selected, so that the complexity of the problem would not hide the main points we wanted to illustrate. Additionally we wanted to demonstrate uncertainties in this problem as it has been used in many papers without reporting uncertainty in error estimates and also it seems probable that ....
[Article contains additional citation context not shown here]
MacKay, D. J. C. (1992). A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):448--472.
....FOCUS useful in a practical setting. Even with such assumptions, Davies and Russell, with reference to John et al.[52] note that the use of wrapper algorithms may be more appropriate than relevance algorithms such as FOCUS. 3.4. 2 Neural networks Automatic relevance determination (ARD) see Mackay[68, 69] and a recent empirical review by Neal[84] is a wrapper algorithm, applicable to multi layer perceptron[68, 69] classification systems. Relevance weightings are produced for each of the inputs to the network by examining the parameters of the network during training. No features are ever ....
....note that the use of wrapper algorithms may be more appropriate than relevance algorithms such as FOCUS. 3.4. 2 Neural networks Automatic relevance determination (ARD) see Mackay[68, 69] and a recent empirical review by Neal[84] is a wrapper algorithm, applicable to multi layer perceptron[68, 69] classification systems. Relevance weightings are produced for each of the inputs to the network by examining the parameters of the network during training. No features are ever completely excluded, so ARD is not strictly a feature selection algorithm; it always uses all the features available. ....
D.J.C. MacKay. A practical bayesian framework for backpropagation networks. Neural Computation, 4:448--472, 1992.
....(concrete quality estimation and forest scene classification) 3. 1 MLP and GP Models Both MLP and GP are flexible nonlinear models, where the available number of parameters p in model may be near or even greater than the number of data samples n and also the effective number of parameters p eff (MacKay, 1992; Spiegelhalter et al. 1998) is usually large compared to n. We used one hidden layer MLP with tanh hidden units, which in matrix format can be written as f (x,# w ) b 2 w 2 tanh# b 1 w 1 x # . 29) The # w denotes all the parameters w 1 , b 1 , w 2 , b 2 , which ....
....cases, chains were run probably much longer than necessary to be on safe side. 3. 2 Toy problem: MacKay s robot arm To illustrate some basic issues of the expected utilities computed from the cross validation predictive densities, we demonstrate them in simple robot arm toy problem used, e.g. in (MacKay, 1992; Neal, 1996) The task is to learn the mapping from joint angles to position for imaginary robot arm . Two real input variables, x 1 and x 2 , represent the joint angles and two real target values, y 1 and y 2 , represent the resulting arm position in rectangular coordinates. The relationship ....
[Article contains additional citation context not shown here]
MacKay, D. J. C. (1992). A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):448--472.
....priors for building such sequential models are presented next. We end by some remarks regarding the relationship between Bayesian learning and the use of network ensembles. 1. Introduction The Bayesian modelling framework, first applied to neural networks in [Buntine and Weigend, 1991] [MacKay, 1992] and [Neal, 1992] is appealing for several reasons. First, instead of obtaining an estimate for the mean prediction of the model, one gets an estimate for the entire distribution of model predictions. This estimate takes into account both the noise in the data and the variance of the models. ....
....approaches. Section 3 gives the detailed results of the extension of the framework to sequential processes. A general discussion of priors, together with a few results concerning recurrent neural networks, are presented in section 4. We remind in section 5 the approximations proposed in [MacKay, 1992], since they may apply to sequential data as well. However, the experimental results presented in [Crucianu et al. 1998] show that such approximations are sometimes inappropriate. Section 6 discusses the relationship between Bayesian learning and the use of network ensembles, which is regarded by ....
MacKay, D. J. C. (1992) A practical Bayesian framework for backpropagation networks, Neural Computation 4: 448-472.
....MAP gives biased results (e.g. noise variance being systematically under estimated) and requires large number of samples in order to get good results. These limitations can be overcome in a Bayesian treatment. Bayesian neural networks with independent output noises have been discussed by MacKay [8] and Neal [9, 10] Purpose of this paper is to show how this problem can be solved using full covariance matrix with Bayesian treatment and Markov Chain Monte Carlo (MCMC) methods. We begin by briefly reviewing the Bayesian neural networks and MCMC implementation in Sections 2 and 3. In Section 4 ....
MacKay, D. J. C. (1992). A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):pp. 448--472.
....model, which specifies the probability of observing the targets t given function values y, to yield a posterior over functions which can then be used for predictions. For neural networks the prior over functions has a complex form which means that implementations must either make approximations [4] or use Monte Carlo approaches to evaluating integrals [6] As Neal [7] has argued, there is no reason to believe that, for real world problems, neural network models should be limited to nets containing only a small number of hidden units. He has shown that it is sensible to consider a limit ....
....which have to be approximated for neural nets, can be carried out exactly (using matrix operations) in this case. I also show that the parameters specifying the Gaussian process can be estimated from training data, and that this leads naturally to a form of Automatic Relevance Determination [4], 7] 1 Large networks cannot be successfully used with maximum likelihood training because of the overfitting problem. 2 By regression problems I mean those concerned with the prediction of one or more real valued outputs, as compared to classification problems. 1 2 Prediction with ....
[Article contains additional citation context not shown here]
D. J. C. MacKay. A Practical Bayesian Framework for Backpropagation Networks. Neural Computation, 4(3):448--472, 1992.
....function for the Gaussian distribution. Appropriately scaled, the graph of this function is very similar to the tanh function which is more commonly used in the neural networks literature. In calculating V (x; x 0 ) def = Eu [h(x; u)h(x 0 ; u) we make the usual assumptions (e.g. MacKay, 1992) that u is drawn from a zero mean Gaussian distribution with covariance matrix Sigma, i.e. u N(0; Sigma) Let x = 1; x 1 ; x d ) T be an augmented input vector whose first entry corresponds to the bias. Then V erf (x; x 0 ) can be written as V erf (x; x 0 ) 1 (2) d 1 2 ....
MacKay, D. J. C. (1992). A Practical Bayesian Framework for Backpropagation Networks. Neural Computation 4(3), 448--472.
....The first part is the standard mean square error measure of matching the network output vectors F(X (p) W ) with the desired output vectors Y (p) for all training data samples p. The second term, scaled by 1 , is frequently used in the weight pruning or in the Bayesian regularization method [58], 59] to improve generalization of the MLP networks. A naive interpretation why such regularization works is based on observation that small weights and thresholds mean that only the linear part of the sigmoid around (0) is used. Therefore the decision borders are rather smooth. On the other ....
....because at the end of the training the slopes should be infinitely steep, corresponding to infinite non zero weights. Such approach may be interesting if the final goal is a hybrid, network rule based system. Introduction of integer weights may also be justified from the Bayesian perspective [58], 59] The cost function specifies our prior knowledge about the probability distribution P(W M) of the weights in our model M. For classification tasks, when crisp logical decisions are required, the prior probability of the weight values should include not only small weights, but also large ....
[Article contains additional citation context not shown here]
D.J. MacKay. "A practical Bayesian framework for backpropagation networks ", Neural Computation 4, 448-472, 1992
....bioinformatics [1] employ hybrid schemes which combine HMMs and NNs. This article focuses on a problem common to both models. For sparse data, the classical training algorithms (mentioned above) which derive from a maximum likelihood (ML) approach, are sub optimal due to overfitting. Following [6], much research on NNs in the last few years has explored Bayesian methods as a possible remedy in this respect. The objective of this article is to take a similar route and test if the generalization performance of HMMs can be improved by Bayesian free energy minimization. 2 Classical HMMs ....
....approach 3.1 General outline of the methodology A possible shortcoming of the maximum likelihood (ML) method is its susceptibility to overfitting when the data are sparse. We therefore adopt a Bayesian approach, which has proven to improve the generalization performance of neural networks [6], 9] and treat the parameters # and w as random variables. Starting from some prior distribution P (#, w) the ultimate objective of learning is to determine the posterior distribution P (s, w, # Y) As this is analytically intractable, we need to make certain approximations. We can sample from ....
D. J. C. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4:448--472, 1992.
....of their predictions. #One exception is for radial basis functions; See Leonard et al. 1992. # In this paper, we compare two approaches to obtaining prediction limits for ANN s: a frequentist approach, based on standard non linear regression theory,andaBayesian approach, following recentwork by MacKay #1992# and Neal #1994#. We present preliminary comparisons of the methods via Monte Carlo methods. We examine the coverage probabilities of the prediction intervals, their computational costs and practical implementation issues of the two approaches. Being able to estimate the uncertainty of ....
....Seber and Wild, 1989#, based on local linearizations of the model. These methods have been applied to neural nets #see e.g Ding and Hwang, 1995#, but although they work well on small problems, they are often less reliable on the larger problems typically addressed using neural networks. Recently #MacKay 1992, Neal 1994#, a Bayesian approach has been proposed for estimating the parameters in neural nets. The parameters in the neural network are considered as being drawn from a distribution #e.g. Gaussian# whichischaracterized by a set of hyperparameters #e.g. the mean and variance of the Gaussian#. ....
[Article contains additional citation context not shown here]
MacKay, D.J.C., #1992# #A Practical Bayesian Framework for Backpropagation Networks." Neural Computation, 4, 448-472.
....particular, in recentyears, a number of regularization methods have been proposed to control the smoothness, and hence the degree of over#tting, in neural networks. As just one example, one can use Bayesian techniques to select network structures and to penalize large weights or reduce over#tting #MacKay, 1992#. 3.1 Example Continuing with our regression example, we consider using the hidden layer sigmoidal network with 5 nodes in the hidden layer, as before. Although it is possible to choose the architecture by cross validation, we will illustrate the procedure of using what we presume to be an ....
MacKay, D.J.C., #1992# #A Practical Bayesian Framework for Backpropagation Networks.
....possible parameter values except for the most probable Bayesian Approaches and Gaussian Processes 26 value. 2.5. 2 Gaussian approximations One way of improving the maximum probability approximation is to represent the posterior by a Gaussian distribution centred on the maximum posterior value [21]. The covariance matrix of the Gaussian is obtained from the Hessian of the error function. 2.5.3 Variational Methods Even if the posterior is reasonably approximated by a Gaussian, the Hessian at the maximum posterior value need not provide a good estimate for the covariance. Another method for ....
....The good thing about MCMC is that it does not rely on one specific shape of distribution. Hence MCMC can be used in situations where no other method is appropriate. 2.5.5 The hyperparameters Approximating the integral over the hyperparameters can be done in similar ways. The evidence framework [21] can be used, which chooses the hyperparameters that maximise Bayesian Approaches and Gaussian Processes 27 the posterior probability of the model (given the data) This stems from the assumption that the posterior distribution is sharply peaked about this maximum value. Whether or not the ....
D. J. C. MacKay. A practical bayesian framework for back-propagation networks. Neural Computation, 4:448--472, 1992.
....(SVC) 2, 4, 7, 10, 13] Here, we follow [2, 4] in applying the evidence framework [5] to SVR. The evidence framework is divided into three levels of inference, and is computationally equivalent to the type II maximum likelihood method in Bayesian statistics. Its use in feedforward neural networks [6] has allowed the automatic selection of the regularization parameters and network architectures, without the need of a validation set. The rest of this paper is organized as follows. A brief overview of ffl SVR and SVR will be given in Section 2. The connections between these two SVR algorithms ....
D.J.C. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):448--472, May 1992.
....this posterior. Sometimes theoretical considerations of the specific problem at hand, show that the effect of the multiple modes is not significant and can therefore be neglected. One focuses on the mode and bases the Bayesian inference analysis entirely on local single Gaussian approximations [14, 15, 3]. Looking at it from the other side, it is known that in many cases the multi modality of the posterior can have serious influence. So one has to look for methods that correctly incorporate or eliminate this influence, as for example Markov Chain Monte Carlo (MCMC) methods, that have the drawback ....
....method the posterior is approximated by a Gaussian centred at a mode of p(wjD) which can for example be found by a local optimisation algorithm where the covariance matrix of the Gaussian is derived from the curvature of the posterior at the mode. This method was studied in depth by MacKay [15] and Bishop [4] The more general approach was introduced by Hinton and Van Camp in [11] They discussed the idea of approximating p(wjD) by an ensemble Q(w; a probability density parameterised by ) and by optimising the quality of this approximation. A classically used measure for the ....
MacKay D., "A Practical Bayesian framework for back-propagation networks," Neural Computation, Vol. 4, No. 3, pp. 448-472, 1995.
....2 ij 2W ij (b ; a) ab) 3) where 1 and 2 scale the relative importance of auxiliary conditions. This form of error function has two advantages: independent parameters control enforcing of the 0 and a, b weights, and an interpretation of this function from the Bayesian point of view [11] is straightforward. It defines our prior knowledge about the probability distribution P (W jM ) of the weights in our model M . Optimal value of a# b parameters are found iteratively, starting from a = b =1values: a = # ij W 3 ij (W ij b) 2 = # ij W 2 ij (W ij b) 2 (4) b = ....
D.J. MacKay, A practical Bayesian framework for backpropagationnetworks, Neural Comp. 4 (1992) 448-472
....method to improve our rules there is a chance of finding a better solution than gradient based neural classifiers are able to find; 3) the problem of finding an optimal balance between the flexibility of adaptive models and the danger of overfitting the data. Although Bayesian regularization [6] may help in case of some neural and statistical classification models, logical rules give much better control over the complexity of the data representation and elimination of outliers. We are sure that in all cases, independently of the final classifier used, it is advantageous to extract crisp ....
.... term in the cost function, may be replaced by one of the lower order terms: W ij W 2 ij 1 cubic (4) W ij W 2 ij 1 quadratic 1 # k= 1 W ij k W ij 12 W ij 12 1 Introduction of integer weights may also be justified from the Bayesian perspective [6]. The cost function specifies our prior knowledge about the probability distribution P (W M) of the weights in our model M.For classification task when crisp logical decisions are required the prior probability of the weight values should include not only small weights but also large positive ....
[Article contains additional citation context not shown here]
D.J. MacKay. "A practical Bayesian framework for backpropagation networks", Neural Computations 4, 448-472, 1992
....controlling the complexity of the model. Another problem of standard MLP models is the lack of tools for analyzing the results (con dence intervals, like 10 and 90 quantiles, etc. Bayesian methods have become a viable alternative to the older error minimization based (ML or MAP) approaches [1,8,10]. The main advantages of Bayesian MLPs are: Preprint submitted to Pattern Recognition Letters 14th September 2000 . Automatic complexity control: Values of regularization coe cients can be selected using only the training data, without the need to use separate training and validation data. ....
....corresponds to the best guess with squared error loss. The posterior distribution for the parameters p(w, #, # D) is typically very complex, with many modes. Evaluating the integral of Eq. 14) is therefore a di cult task. The integral can be approximated with parametric approximation as in [8] or with numerical approximation as described in next section. 3.5. Markov chain Monte Carlo method Neal has introduced implementation of Bayesian learning for MLPs in which the di cult integration of Eq. 14) is performed using Markov chain Monte Carlo (MCMC) methods [10] In [6] there is a ....
David J. C. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):448472, 1992.
....task is simpler than the full estimation of joint or conditional probabilities. Bayesian solution is achieved by introducing a priori probability distributions of parameters (so far it has been possible only for weights in MLP networks) and marginalizing (integrating) over these parameters [9]. In practice committee of networks give results of similar accuracy and are computationally less expensive than mariginalization. Below FSM model as used for classification problems is described. Since training of the MLP networks or other networks with fixed architecture is NP hard [3] a ....
D.J. MacKay, "A practical Bayesian framework for backpropagation networks", Neural Computations 4 (1992) 448-472
....was as follows. Chapter 7: Bayesian Methods for Mixtures of Experts. I first thought of using Bayesian methods for mixtures of experts after becoming frustrated with the singularity problems of the maximum likelihood algorithm [112] I developed an approach based on MacKay s evidence framework [133] for multi layer perceptrons which I subsequently discussed with David MacKay. The result of these discussions and my initial experiments was a joint paper [235] During our discussions, however, David suggested the use of Neal and Hinton s [154] free energy framework for the mixtures of experts. ....
....Gelman et al. 67] Complexity control of neural networks by Bayesian methods is done by placing a prior on the parameters . Predictions are made by integrating over the posterior distribution of parameters. Two approaches have been proposed for training neural networks by Bayesian methods [29, 133, 153]. Since the log function is monotonic, maximising the likelihood is also equivalent to maximising the log likelihood. The minimum description length approach (MDL) Rissanen [196] is also often used in a similar way to Bayesian inference. CHAPTER 1. INTRODUCTION 7 In the first a Gaussian ....
[Article contains additional citation context not shown here]
MacKay, D. J. C. [1992b], `A practical Bayesian framework for backpropagation networks', Neural
No context found.
D. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):448--472, 1992.
No context found.
D. J. MacKay, "A practical Bayesian framework for backpropagation networks," Neural Comput., vol. 4, pp. 448--472, 1992.
No context found.
D.J.C. MacKay, "A practical Bayesian framework for backpropagation networks," Neural Comput., vol 4, pp. 448-472, 1992.
No context found.
D. J. C. MacKay. A practical Bayesian framework for back propagation networks. Neural Computation, 4(3):448--472, 1992.
No context found.
MacKay, D. J. C., 1992b. A practical Bayesian framework for backpropagation networks. Neural Computation, 4, 448--472.
No context found.
MACKAY, D.J.C., A practical Bayesian framework for backpropagation networks, in Neural Computation,4(3) (1992),p.448-472.
No context found.
D.J.C. MacKay, A practical Bayesian framework for backpropagation networks, Neur. Comput. 4 (2) (1992) 448--472.
No context found.
D. J. C. MacKay, "A practical Bayesian framework for backpropagation networks," Neural Computa., vol. 4, pp. 448--472, 1992.
No context found.
D. J. C. MacKay. A Practical Bayesian Framework for Backpropagation Networks. Neural Computation, 4(3), 448-472, 1992.
No context found.
D. J. C. MacKay. A practical Bayesian framework for back-propagation networks. Neural Computation, 4(3):448-472, 1992.
No context found.
D. J. C. MacKay. A Practical Bayesian Framework for Backpropagation Networks. Neural Computation, 4:448-472, 1992.
No context found.
D. J. MacKay, "A practical Bayesian framework for backpropagation networks," Neural Comput., vol. 4, pp. 448--472, 1992.
No context found.
D. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4:448-472, 1992.
No context found.
MacKay DJC. A practical Bayesian framework for back-propagation networks. Neural Computation 1992; 4: 448--472
No context found.
D. J. C. MacKay, "A practical Bayesian framework for backpropagation networks," Neural Computation, vol. 4, no. 3, pp. 448--472, 1992.
No context found.
D. J. C. MacKay, "A Practical Bayesian Framework for Backpropagation Networks, " Neural Computation, vol. 4, no. 3, pp. 448--472, 1992.
No context found.
MacKay : "A practical bayesian framework for backpropagation networks" , Neural Computation 4:3 (1992) 448-472
No context found.
MacKay, D. (1992b). A practical Bayesian framework for backpropagation networks. Neural Computation, 4:448--472.
No context found.
MacKay, D. (1992b). A practical Bayesian framework for backpropagation networks. Neural Computation, 4:448-472.
No context found.
Mackay DJC (1992). A practical bayesian framework for back-propagation networks. Neural Computation 4(3):448-472.
No context found.
MacKay, D.J.C. (1992b), `A practical Bayesian framework for back-propagation networks', Neural Computation 4(3), 448--472.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC