| Andrieu, C., de Freitas, J. F. G. and Doucet, A. (1999a). Robust full Bayesian learning for neural networks, Technical Report CUED/F-INFENG/TR 343, Cambridge University Engineering Department. |
....the least squares estimate of # is given by # = D y. 6) An estimate of # is then given by # #) #) y P # J y, where P # J = I N D[D . 7) Base on the the minimum description length (MDL) criterion, we can impose the following a priori distribution on J [2], P (J) exp J d 1 log N . 8) Assuming the noise samples are i.i.d. Gaussian, it can then be shown that the joint posterior distribution of J, 1 , J is given by [2] p(J, 1 , J ) P # J y) P (J) 9) Hence the maximum a posteriori (MAP) estimate ....
....the minimum description length (MDL) criterion, we can impose the following a priori distribution on J [2] P (J) exp J d 1 log N . 8) Assuming the noise samples are i.i.d. Gaussian, it can then be shown that the joint posterior distribution of J, 1 , J is given by [2], p(J, 1 , J ) P # J y) P (J) 9) Hence the maximum a posteriori (MAP) estimate of these parameters is obtained by maximizing the right hand side of (9) This can be done by using the reversible jump MCMC algorithm [2] 4 CoD for Predictors A natural way to select a set of ....
[Article contains additional citation context not shown here]
C. Andrieu, J. Freitas, and A. Doucet, "Robust full Bayesian learning for neural networks," http://www.cs.berkeley.edu/#jfgf/software.html, 1999.
....selecting the optimal size of a neural network. Much of the recent work has been in the Bayesian framework, and includes gaussian approximations for the posterior to approximate posterior probabilities (MacKay, 1992) and reversible jump MCMC methods (M uller and Rios Insua, 1998 (for regression) Andrieu et al. 1999 (for radial basis networks) More established methods (see Bishop (1995) include cross validation (Stone, 1974) and penalized likelihood methods using the Akaike Information Criterion (AIC) Akaike, 1974) the Bayesian Information Criterion (BIC) Schwarz, 1978) or the Network Information ....
Andrieu, C., de Freitas, J. F. G., and Doucet, A. (1999). \Robust Full Bayesian Learning for Neural Networks." Tech. Rep. 343, Cambridge Univeristy, Engineering Department.
.... Key Words: Bayesian Statistics; Improper Prior; Markov Chain Monte Carlo 1 Introduction Much of the existing Bayesian neural network literature relies upon complex hierarchical priors for the parameters of the network, for example, MacKay (1992) Neal (1996) M uller and Rios Insua (1998) and Andrieu et al. 1999). In this paper, I propose a fully noninformative prior with several philosophical and practical advantages. I also discuss how to implement model tting via Markov chain Monte Carlo with this prior. In the Bayesian context, one s choice of prior is meant to re ect either knowledge from previous ....
....being not informative, do not provide the same shrinkage as some other priors (or methods such as weight decay) and so the user must keep this in mind. It is advisable to use some model selection technique, such as those in MacKay (1992) Neal (1996) M uller and Rios Insua (1998) Lee (1998) or Andrieu et al. 1999). In this paper, I focus exclusively on feed forward neural networks with a single hidden layer of units with logistic activation functions, use linear output units, and do not allow direct connections from the inputs to the outputs. However, all of these methods and results are fully ....
[Article contains additional citation context not shown here]
Andrieu, C., de Freitas, J. F. G., and Doucet, A. (1999). \Robust Full Bayesian Learning for Neural Networks." Tech. Rep. 343, Cambridge Univeristy, Engineering Department.
No context found.
Andrieu, C., de Freitas, J. F. G. and Doucet, A. (1999a). Robust full Bayesian learning for neural networks, Technical Report CUED/F-INFENG/TR 343, Cambridge University Engineering Department.
....that the birth and death move are e cient enough. However, we think that they are of pedagogic interest as they illustrate the #exibility of the approach, and more importantly adaptations have proved to be useful for other types of regression problems for which ambiguities are more likely to occur [4]. Assume that there are k#1 sinusoids. Our proposal for the merge move begins by choosing at random a pair l of sinusoids which are adjacent in terms of their frequencies. To simplify notation, we will denote these two sinusoids as (a ## , a ## , # # ) and (a ## , a ## , # # ) One can ....
C. Andrieu, J.F.G. de Freitas, A. Doucet, Robust full Bayesian learning for neural networks, IEEE Trans. Neural Networks, submitted for publication, available as Technical Report CUED/F-INFENG/TR. 343, University of Cambridge, UK, 1999.
.... Koller and Russell 1995) and the bootstrap lter (Gordon, Salmond and 10 Smith 1993) It is also possible to design more clever proposal distributions by adopting suboptimal lters and other approximation methods that make use of the information available at time t (Doucet, Godsill and Andrieu 2000, de Freitas, Niranjan, Gee and Doucet 2000, Pitt and Shephard 1999, van der Merwe, Doucet, de Freitas and Wan 2000) In fact, in some restricted situations, one may interpret the likelihood as a distribution in terms of the states and sample from it directly. In doing so, the importance weights ....
Andrieu, C., de Freitas, J. F. G. and Doucet, A. (1999a). Robust full Bayesian learning for neural networks, Technical Report CUED/F-INFENG/TR 343, Cambridge University Engineering Department.
....curves do not belong to the convex hull, as these do not contribute towards the maximum realisable classi er. That is, only the convex hull samples need to be stored. In this paper, we propose a reversible jump MCMC algorithm to estimate the classi er parameters and number of parameters jointly (Andrieu, de Freitas and Doucet 1999, Green 1995) However, other MCMC strategies, such as hybrid Monte Carlo methods for logistic neural networks (Neal 1996) could also be adopted. Our classi cation model is described in the following section 3 . Let C , fk; g denote a classi er with parameters and number of parameters k, a ....
....that although one can get very good results on the validation data set, one is not guaranteed to oupterform the Bayesian classi er on a new data set. One has to ensure that we do not over t the validation data set. 3 For more details and software, readers are referred to our technical report (Andrieu et al. 1999), available at http: www.cs.berkeley.edu jfgf. 4 5 Classi cation Model We adopt the approximation scheme of Holmes and Mallick (1998) consisting of a mixture of k RBFs and a linear regression term (Holmes and Mallick 1998) Yet, the work can be easily extended to other classi cation models. ....
Andrieu, C., de Freitas, J. F. G. and Doucet, A. (1999). Robust full Bayesian learning for neural networks, Technical Report CUED/F-INFENG/TR 343, Cambridge University Engineering Department.
....c is assumed to be zero mean Gaussian; its variance changes over time. Depending on our a priori knowledge about the smoothness of the mapping, we can choose di erent types of basis functions. The most common choices are linear, cubic, thin plate spline, Gaussian and multiquadric basis functions [1, 11]. For convenience, we express the approximation model in vector matrix form 2 y t = D( 1:k t ;t ; x t ) 1:1 d k t ;t n t , 2 We adopt the notation y 1:c;t , y1;t ; y2;t ; y c;t ) 0 to denote all the output observations at time t. To simplify this notation, y t is equivalent to y ....
C Andrieu, J F G de Freitas, and A Doucet. Robust full Bayesian learning for neural networks. Technical Report CUED/F-INFENG/TR 343, Cambridge University Engineering Department, April 1999.
No context found.
Andrieu, C., de Freitas, J. F. G. and Doucet, A. (1999a). Robust full Bayesian learning for neural networks, Technical Report CUED/F-INFENG/TR 343, Cambridge University Engineering Department.
....model order and subsequently selecting the best model. We also present a convergence theorem for the algorithm. The complexity of the problem does not allow for a comprehensive discussion in this short paper. Readers are encouraged to consult our technical report for further results and details (Andrieu, de Freitas and Doucet 1999) 1 . 2 MODEL SPECIFICATION We adopt the approximation scheme of Holmes and Mallick (1998) consisting of a mixture of k RBFs and a linear regression term. The work can, however, be straightforwardly extended to many other interesting inference and learning problems, such as fMRI time series ....
....: c. It should be stressed that 2 depends implicitly on the model order k. The number k of RBFs and their parameters , f 1:m;1:c ; 1:k;1:d ; 2 1:c g, with m = 1 d k, are unknown. Given the data set fx; yg, the objective is to estimate k and 2 k . 3 PROBABILISTIC MODEL In (Andrieu, de Freitas and Doucet 1999, Andrieu, de Freitas and Doucet 2000) we follow a Bayesian 2 The notation y 1:N;1:c is used to denote an N by c matrix, where N is the number of data and c the number of outputs. That is, y 1:N;j , y1;j ; y2;j ; yN;j ) 0 denotes all the observations corresponding to the j th output ....
[Article contains additional citation context not shown here]
Andrieu, C., de Freitas, J. F. G. and Doucet, A. (1999). Robust full Bayesian learning for neural networks, Technical Report CUED/FINFENG /TR 343, Cambridge University Engineering Department.
....the birth and death move are efficient enough. However we think that they are of pedagogic interest as they illustrate the flexibility of the approach, and more importantly adaptations have proved to be useful for other types of regression problems for which ambiguities are more likely to occur [4]. Assume that there are k 1 sinusoids. Our proposal for the merge move begins by choosing at random a pair l of sinusoids which are adjacent in terms of their frequencies. To simplify notation, we will denote these two sinusoids as (a c1 ; a s1 ; 1 ) and (a c2 ; a s 2 ; 2 ) One can merge ....
C. Andrieu, JFG de Freitas and A. Doucet, "Robust full Bayesian learning for neural networks", submitted IEEE Trans. Neural Networks, available as Tech. Report CUED/F-INFENG/TR 343, University of Cambridge, UK, 1999.
....curves do not belong to the convex hull, as these do not contribute towards the maximum realisable classifier. That is, only the convex hull samples need to be stored. In this paper, we propose a reversible jump MCMC algorithm to estimate the classifier parameters and number of parameters jointly (Andrieu, de Freitas and Doucet 1999, Green 1995) However, other MCMC strategies, such as hybrid Monte Carlo methods for logistic neural networks (Neal 1996) could also be adopted. Our classification model is described in the following section 3 . Let C , fk; g denote a classifier with parameters and number of parameters k, a ....
....the above algorithm. In particular, it is possible to treat other problems such as variable selection, multiple chains and sequential Monte Carlo. It is also possible to adopt more complex cross validation methods. 3 For more details and software, readers are referred to our technical report (Andrieu et al. 1999), available at http: www svr.eng.cam.ac.uk jfgf. 5 Classification Model We adopt the approximation scheme of Holmes and Mallick (1998) consisting of a mixture of k RBFs and a linear regression term (Holmes and Mallick 1998) Yet, the work can be easily extended to other classification ....
Andrieu, C., de Freitas, J. F. G. and Doucet, A. (1999). Robust full Bayesian learning for neural networks, Technical Report CUED/F-INFENG/TR 343, Cambridge University Engineering Department.
....hidden neurons. Figure 4 illustrates the convergence of the algorithm. In this particular run the training and test mean square errors were 0:0057 and 0:0081 (the minimum bound being 2oe 2 = 0:005) Our mean square errors are of the same magnitude as the ones reported by other researchers (Andrieu, de Freitas and Doucet 1999, Holmes and Mallick 1998, Mackay 1992, Neal 1996, Rios Insua and Muller 1998) Figure 4 also shows the two diagonal entries of the measurements noise covariance and the trace of the process noise covariance. They behave as expected. 6.3 Classification with medical data Here, we consider an ....
Andrieu, C., de Freitas, J. F. G. and Doucet, A. (1999). Robust full Bayesian learning for neural networks, Technical Report CUED/F-INFENG/TR 343, Cambridge University Engineering Department.
....sigmoidal hidden neurons. Figure 7, shows the convergence of the algorithm. In this particular run the training and test mean square errors were 0:0057 and 0:0081 (the minimum bound being 2oe 2 = 0:005) Our mean square errors are of the same magnitude as the ones reported by other researchers (Andrieu, de Freitas and Doucet 1999, Holmes and Mallick 1998, Mackay 1992, Neal 1996, Rios Insua and Muller 1998) Figure 7 also shows the two diagonal entries of the measurements noise covariance and the trace of the process noise covariance. They behave as expected. 7.3 Classification with medical data Here, we consider an ....
Andrieu, C., de Freitas, J. F. G. and Doucet, A. (1999). Robust full Bayesian learning for neural networks, Technical Report CUED/F-INFENG/TR 343, Cambridge University, http://svr-www.eng.cam.ac.uk/.
....However, in future work, we aim to treat this variable as a random parameter and estimate it via standard information criteria and Bayesian techniques. Alternatively, we can fix the kernel variance and compute the window length using the reversible jump Markov chain Monte Carlo (MCMC) algorithm ([1, 2, 9]) We now turn our attention to the problem of computing . In order to estimate sequentially, while at the same time accounting for modelling errors and measurement noise, we adopt the following state space Markovian representation: t 1 = t j t (8) y t 1 = D t 1 t 1 ffl t 1 (9) ....
C Andrieu, J F G de Freitas, and A Doucet. Robust full Bayesian learning for neural networks. Technical Report CUED/F-INFENG/TR 343, Cambridge University, http://svr-www.eng.cam.ac.uk/, April 1999.
No context found.
Andrieu,C., Freitas,J. and Doucet,A. (2001) Robust full Bayesian learning for neural networks. Neural Comput., 13, 2359--2407.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC