| MacKay, D.J.C., 1992, The evidence framework applied to classification networks, Neural Computation, 4 (5), 720-736. |
....squares) support vector machines, regularization networks and Nadaraya Watson estimators. This reasoning can be extrapolated easily towards the general class of kernel based methods. Additional links between the noise variance, smoothing and regularization are given by the Bayesian framework [12, 19], Gaussian Processes [12, 19] statistical learning theory [3, 7] splines [21] and regularization theory [15, 7] One way to avoid the paradox of noise variance estimation before effective modeling and estimating the residuals is to use a non parametric estimator. The basic principle of the ....
....machines, regularization networks and Nadaraya Watson estimators. This reasoning can be extrapolated easily towards the general class of kernel based methods. Additional links between the noise variance, smoothing and regularization are given by the Bayesian framework [12, 19] Gaussian Processes [12, 19], statistical learning theory [3, 7] splines [21] and regularization theory [15, 7] One way to avoid the paradox of noise variance estimation before effective modeling and estimating the residuals is to use a non parametric estimator. The basic principle of the presented estimator is that ....
[Article contains additional citation context not shown here]
D. MacKay, "The evidence framework applied to classification networks," Neural Computation, vol. 4, pp. 698--714, 1992.
....specification, the targets tn E 0, 1 . Unlike the regression case, the weights cannot be integrated out analytically, precluding closed form expressions for either the weight posterior p(wlt , c) or the marginal likelihood P(tlc ) We thus ntilise the Laplace approximation procedure, as used in [6]: quantity (10) is differentiated twice to give: VwVw logp(wlt, et) MP = TB A) 11) where B = diag (fz, f2, BIN) is a diagonal matrix with f, a y(x) 1 a y(x) This is then negated and inverted to give the covariance 5 for a Gaussian approximation to the posterior over weights ....
D. J. C. MacKay. The evidence framework ap- plied to classification networks. Neural Computation, 4(5):720-736, 1992.
....and compare it to a method for outlier detection which uses a variance based measure. The effects of both methods are visualized using a simple artificial classification problem. Results of outlier rejection are presented in the final section. II. The Bayesian view of confidence In [D.J.C. MacKay, 1992b] the author uses Bayesian inference and marginalization to get moderated probabilities for classes in regions, where the classifier is uncertain about the class label. We may expect a trend towards equal probabilities in such regions and are able to refuse classification by flagging doubt if ....
D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4:720--736, 1992.
....(GMOHMM) is used to discriminate coefficients of an Auto Regressive (AR) process as being either of K classes. Bayesian inference is known to give reasonable results when applied to AR models ( RF95] The situation with classification is similar, see for example the seminal work by [Nea96] and [Mac92]. Hence we may expect to get good results if we apply Bayesian techniques to both stages of the decision process separately. However this is suboptimal since it meant to establish a no probabilistic link between feature extraction and classification. Two arguments suggest the building of one ....
D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4:720--736, 1992.
....data with an astonishing degree of accuracy as demonstrated in simulations. Comparisons to existing learning theoretical bounds, as e.g. the span bound, are given for model selection and LOO error prediction scenarios. 1 Introduction Numerous methods have been proposed for model selection [13, 12, 2, 6, 14, 5, 4]. They all try to find a reasonably good estimate of the generalization error to select the proper hyperparameters. The data dependent LOO error would in principle be ideal for selecting hyperparameters of learning machines, as it is an (almost) unbiased estimator of the true generalization error. ....
D. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):720--736, 1992.
....a survey of literature on the first two research categories refer to [Sar96] and the references therein. Bayesian ideas have been developed for application to (i) learning and regularization, ii) evaluation of trained networks, and (iii) comparison of different trained networks [Mac92a, Mac92d, Mac92b] A big advantage of Bayesian methods is that they can eliminate the requirement of test data. Moreover, Bayesian methods can be used for information based selection of training data for better generalization performance [Mac92c] Despite many good features, Bayesian methods being relatively new, ....
D. J. C. Mackay. The evidence framework applied to classification networks. Neural Computation, 1992.
.... with the corresponding parts, so to build a model part pair database (training set) In our case, as in [13, 14] the discriminative classifier is implemented in the form of a committee of feedforward connectionist architectures, trained and selected using MacKay s Bayesian evidence framework [52]. 26 Landmark Identification and Pose Determination in Unstructured Environments Note that, within the above approximations, the final joint landmark posterior (equal to the product of the individual landmark posteriors maximizing Equation 15) is a Gaussian with respect to spatial coordinates, ....
D. J. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):698--714, 1991.
....similarity refers to motifs (or frequently occurring substrings) in the sequences. Sections 2 and 3 elaborate on how to find the global and local similarity of the protein sequences. Section 4 presents our classification algorithm, which employs the Bayesian neural network originated from Mackay [25]. Section 5 evaluates the performance of the proposed classifier. Section 6 compares our approach with the other protein classifiers. Section 7 concludes the paper. 2 Global Similarity of Protein Sequences To calculate the global similarity of protein sequences, we adopt the 2 gram, also known ....
....have relatively little effect on PIR sequence classification and a combination of the proposed techniques already yields a very high precision, as our experimental results show later. 4 The Bayesian Neural Network Classifier We adopt the Bayesian neural network (BNN) originated from Mackay [25] to classify protein sequences. 6 There are N g 2 input features, including N g 2 grams, the LCC 6 Software available at http: wol.ra.phy.cam.ac.uk pub mackay README.html. 11 2 gram 1 LCC Value 2 gram N LS Value g Figure 1: The Bayesian neural network architecture. feature described ....
[Article contains additional citation context not shown here]
D. J. C. Mackay. The evidence framework applied to classification networks. Neural Computation 4(5), 698--714, 1992.
.... activation and s 2 t is the variance of the activation given by s 2 t = x T t Sigma t Gamma1 x t (6) This gives rise to the concept of a moderated output, y t , defined as y t = P ( Z t = 1) Z P ( Z t = 1 j a t )p(a t )da t (7) This integral can be accurately approximated by [10] y t = g(K(s t )a t ) 8) where K(s t ) 1 s 2 t 8 Gamma1=2 (9) The unmoderated output, y t , is given by y t = g(a t ) The moderation changes the actual output, y t , to a moderated output, y t , which is nearer to 0.5 by an amount which is proportional to the prior ....
....) The moderation changes the actual output, y t , to a moderated output, y t , which is nearer to 0.5 by an amount which is proportional to the prior uncertainty on the model parameters. Moderated outputs are typically better than unmoderated outputs in terms of the likelihood of predictions [10]. If the posterior distribution is approximated by a Gaussian, its mean and covariance can be found via Newton s method as formulated by Spiegelhalter and Lauritzen [15] and described in Appendix A. The update equations are Sigma t = Sigma t Gamma1 Gamma y t (1 Gamma y t ) 1 y t (1 ....
[Article contains additional citation context not shown here]
D.J.C. Mackay. The evidence framework applied to classification networks. Neural Computation, 4(5):720--736, 1992.
....then those that maximizing the evidence. In a fully Bayesian treatment, the hyperparameters should be integrated over, with a weight proportional to the evidence; maximizing the evidence can be seen as an approximation to this which assumes that the evidence is sharply peaked around its maximum (MacKay, 1992). Note that in (16) I have made the conditioning of the evidence on the hyperparameters explicit by writing P (Dj) instead of P (D) We are now in a position to understand why normalization of the probability model is important. Let us assume that there exist true values of the ....
....What are the relative merits of the two types of evidence defined above As explained previously, the conditional evidence P (Y jX) regards the training inputs as fixed and only considers the likelihood of the training outputs. This is the quantity that is conventionally defined as the evidence (MacKay, 1992); it disregards all information about the input space. In our scenario, this is reflected in the fact that as can be seen from eqs. 21,23,24) P (Y jX) is independent of the assumed input distribution Q(x) of the unnormalized probability model. The joint evidence P (X; Y ) on the other hand, ....
[Article contains additional citation context not shown here]
MacKay, D. J. C.: 1992, `The Evidence Framework Applied to Classification Networks'. Neural Computation 4, 720--736.
....refers to motifs (or frequently occurringsubstrings) in the sequences. Sections 2 and 3 elaborate on ho w to find the global and local similarity of the protein sequences. Section 4 presen ts our classification algorithm, which employs the Bayesian neural netw ork originated from Mackay [5]. Section 5 ev aluates the performance of the proposed classifier. Section 6 compares our approach with other protein classifiers. Section 7 concludes the paper. Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided ....
....and have small LS values with low probabilities. On the other hand, negative sequences will have small LS values with high probabilities and have large LS values with low probabilities. 4. THE BAYESIAN NEURAL NETWORK CLASSIFIER We adopt the Bayesian neural network (BNN) originated from Mackay [5] to classify protein sequences. There are Ng 2 input features, including Ng 2 gram patterns, the LCC feature described in Section 2 and the LS feature described in Section 3. Thus, a protein sequence is represented as a vector of Ng 2 real numbers. The BNN has one hidden layer containing ....
[Article contains additional citation context not shown here]
D. J. C. Mackay. The evidence framework applied to classification networks. Neural Computation, 4(5):698--714, 1992.
....weights. The risk of overfitting may be alleviated through the use of suitable pruning procedures. Ultimately the most important challenge to their use would come from reliable methods for determining the network architecture directly before training. For example, using Bayesian model comparison [84, 85, 86] networks with different numbers of hidden nodes will give rise to different evidences and the model with the largest evidence would be the most suitable choice. However, though there is some correlation between Bayesian evidence and generalisation performance [85, 130] it is not clear if this ....
D.J.C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4:720--736, 1992.
....DC A P (7 Y (7.54) Since we have made a Gaussian approximation to , a local linear approximation to means A ]6 will also be a Gaussian [14, pp.405] This integral does not have an analytic solution, however. Potentially, an approximation such as that that used by MacKay [132] for evaluating the integral of a Gaussian times a sigmoid could be used, although in this case we have the integral of a Gaussian times the log of a sigmoid. Alternatively, a variational method [95] could be used to bound the integral with a simpler expression. Another possibility is the use of ....
MacKay, D. J. C. [1992a], `The evidence framework applied to classification networks', Neural
....or trace of a matrix in k 2 time [19] Application to classification problems This paper has thus far discussed the evaluation of the evidence for backprop networks trained on interpolation problems. Neural networks can also be trained to perform classification tasks. A future publication [15] will demonstrate that the Bayesian framework for model comparison can be applied to these problems too. Relation to V C dimension Some papers advocate the use of V C dimension [1] as a criterion for penalising over complex models [2, 11] V C dimension is most often applied to ....
D.J.C. MacKay (1991). The evidence framework applied to classification networks, in preparation.
No context found.
MacKay, D.J.C., 1992, The evidence framework applied to classification networks, Neural Computation, 4 (5), 720-736.
No context found.
David J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):698--714, 1992. 97
No context found.
MacKay D.J.C.: The evidence framework applied to classification networks, Neural Computation, 4:720-736, 1992.
No context found.
D. MacKay, The Evidence Framework Applied to Classification Networks, Neural Computation, Vol. 4, 720-736, 1992.
No context found.
David J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):698--714, 1992.
No context found.
D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):720--736, 1992.
No context found.
D. J. C. MacKay, "The evidence framework applied to classification networks," Neural Computation, vol. 4, pp. 698--714, 1992.
No context found.
D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):720--736, 1992.
No context found.
MacKay DJC. The evidence framework applied to classification networks, Neural Comput. 1992;4(5): 698-741.
No context found.
D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):720--736, 1992.
No context found.
MacKay, D.J.C. (1992a), `The evidence framework applied to classification networks', Neural Computation 4(5), 720--736.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC