### Table 5.9: Common transformations performed prior to application of optimizing trans- formations.

### Table 2: Prior for several common conjugate models based on the shrinkage parameter 1st stage 2nd stage prior choice of constant

"... In PAGE 5: ... For the normal model, the harmonic mean of the 2 i is a simple choice (DuMouchel, 1994). Table2 presents the prior for several common situations and also, reasonable choices for the constants. In this table, the ti stand for xed exposures.... ..."

### Table 2: Supervised classification average test set accuracy for max-prior (the frequency of the most common classification), support vector machines, genetic programming, MOSES, and MOSES with voting.

"... In PAGE 4: ... This approach (as well as other more sophisticated ensemble methods) was applied to GP classifiers in [18]. The row in Table2 for MOSES+Voting is thus an average of ten sets of runs (each set consisting of ten runs). 4.... ..."

### Table 2: Example on prioring in a MPL-NN with the lyngby toolbox. - Pruning and common weight decay parameter on input and output weights with grid search on weight decay parameter. - No prior on input and output biases. - No modelling of output noise (assumed Gaussian)

### Table 6.8: Supervised classification test set accuracy for MOSES, genetic programming, support vector machines, and max-prior (the frequency of the most common classification). * indicates a result achieved over the entire dataset, rather than test accuracy.

2006

### Table 6.8: Supervised classification test set accuracy for MOSES, genetic programming, support vector machines, and max-prior (the frequency of the most common classification). * indicates a result achieved over the entire dataset, rather than test accuracy.

2006

### Table 3: Distributions and their conjugate priors

1994

"... In PAGE 25: ... Once the posterior distribution is found, and assuming it is one of the standard distributions, the property can easily be established. Table3 in Appendix B gives some standard conjugate prior distri- butions for those in Table 2, and Table 4 gives their matching posteriors. More extensive summaries of this are given by DeGroot (1970) and Bernardo and Smith (1994).... In PAGE 41: ... The evidence for some common exponential family distributions is given in Appendix B in Table 5 For instance, consider the learning problem given in Figure 24. Assume that the variables var1 and var2 are both binary (0 or 1) and that the parameters 1 and 2 are interpreted as follows: p(var1 = 0j 1) = 1 ; p(var2 = 0jvar1 = 0; 2) = 2;0j0 ; p(var2 = 0jvar1 = 1; 2) = 2;0j1 : If we use Dirichlet priors for these parameters, as shown in Table3 , then the priors are: ( 1; 1 ? 1) Dirichlet( 1;0; 1;1) ; ( 2;0jj; 1 ? 2;0jj) Dirichlet( 2;0jj; 2;1jj) for j = 0; 1 ; where 2;0j0 is a priori independent of 2;0j1. The choice of priors for these distributions is discussed in (Box amp; Tiao, 1973; Bernardo amp; Smith, 1994).... In PAGE 42: ...statistics, and contribution to the evidence. Conjugate priors from Table3 in Appendix B (using yjx Gaussian) are indexed accordingly as: ijjj ijj Gaussian( 0;ijj; 0;ijj 2 ijj ) for i = 1; 2 and j = 0; 1 ; ?2 ijj Gamma( 0;ijj=2; 0;ijj) for i = 1; 2 and j = 0; 1 : Notice that 0;ijj is one-dimensional when i = 0 and two-dimensional when i = 2. Suitable su cient statistics for this situation are read from Table 4 by looking at the data summaries used there.... In PAGE 58: ... Further details and more extensive tables can be found in most Bayesian textbooks on probability distributions (DeGroot, 1970; Bernardo amp; Smith, 1994). Table3 gives some standard conjugate prior distributions for those in Table 2, and Table 4 gives their matching posteriors (DeGroot, 1970; Bernardo amp; Smith,... In PAGE 60: ...Distribution Evidence j C-dim multinomial Beta(n1 + 1; : : :; nC + C)=Beta( 1; : : :; C) yjx Gaussian det1=2 0 N=2 det1=2 ?(( 0+N)=2) ( 0+N)=2 ?( 0=2) 0=2 0 x Gamma 0 0 ?(N + 0) ?( 0)?PN i=1 xi+ 0 N + 0 for xed x d-dim Gaussian det 0=2 S0 ( )dN=2 det( 0+N)=2(S+S0) Nd 0 (N+N0)d Qd i=1 ?(( 0+N?1?i)=2) ?(( 0?1?i)=2) Table 5: Distributions and their evidence 1994). For the distributions in Table 2 with priors in Table3 , Table 5 gives their matching evidence derived using Lemma 6.4 and cancelling a few common terms.... ..."

Cited by 189

### Table 2: Distributions and their conjugate priors

1994

"... In PAGE 40: ...stablish properties of it. In Sections 5.3.2 and 5.3.3 it is shown how this property forms the basis of several fast Bayesian learning algorithms looking at multiple models, including decision trees [Bun91b], and Bayesian networks [SL90, SDLC93, Bun91d]. Table2... In PAGE 44: ...2.3 Recognizing and using the exponential family As a nal note, how can we apply these operations automatically to a graphical model? Which distributions are exponential family and which have conjugate distributions with normalizing con- stants in closed form? Table2 gives a selection of distributions, and their conjugate distribution. Further details can be found in most textbooks on probability distributions.... In PAGE 54: ...emma 5.8 Consider the context of Lemma 5.1. Then the model likelihood or evidence, given by evidence(M) = p(x1; : : :; xNjy1; : : :; yN; M), can be computed as: evidence(M) = p( j ) QN j=1 p(xjjyj; ) p( j 0) = Z ( 0) Z ( )ZN 2 : For the distributions in Table 1 with priors in Table2 , Table 4 gives their matching evidence derived using Lemma 5.8 and cancelling a few common terms.... In PAGE 55: ...Learning with Graphical Models 55 p(var2 = 1jvar1 = 0; 2) = 2;0j0 ; p(var2 = 1jvar1 = 1; 2) = 2;0j1 : If we use Dirichlet priors for these parameters, as shown in Table2 , then the priors are: ( 1; 1 ? 1) Dirichlet( 1;0; 1;1) ; ( 2;0jj; 1 ? 2;0jj) Dirichlet( 2;0jj; 2;1jj) for j = 0; 1 ; where 2;0j0 is aprior independent of 2;0j1. Arguments for choosing the values of these parameters are given later in Section 6.... In PAGE 55: ... Each get their own parameters, su cient statistics, and contribution to the evidence. Conjugate priors from Table2 (using xjy Gaussian) are indexed accordingly as: ijjj ijj Gaussian( 0;ijj; 0;ijj 2 ijj ) for i = 1; 2 and j = 0; 1 ; ?2 ijj Gamma( 0;ijj=2; 0;ijj) for i = 1; 2 and j = 0; 1 ; Notice that 0;ijj is one dimensional when i = 0 and two dimensional when i = 2. Suitable su cient statistics for this situation are read from Table 3 by looking at the data summaries used there.... ..."

Cited by 189

### Table 10 Performance (AUC) using cosine distance with small training sets (250 examples) and interaction with skew (1st and Least Common, and where the latter equals 1 the number of values that appeared only once), unconditional prior of class 1, and average bag size

2006

"... In PAGE 29: ...efined ( customers who ... ) than the majority class ( everyone else ), as is often the case. Table10 presents these three factors for all eight domains, and the ranking performance (AUC) with small training sets (250 training cases) using class-conditional cosine distances. The first two columns show the skew.... ..."

### Table 2 Posterior Inclusion Probabilities Across Parameter Priors Model Prior = Uniform

2007

"... In PAGE 15: ... It thus seems possible that the BMA results would vary considerably between priors. Table2 reports the BMA posterior inclusion probabilities for all 12 prior distributions applied to the growth dataset. Jeffreys (1961) proposed rules of thumb, refined by Kass and Raftery (1995), suggesting that the evidence for a regressor having an effect is either weak, positive, strong, or decisive when the posterior inclusion probabilities range from 50-75%, 75- 95%, 95-99%, and gt; 99%, respectively.... In PAGE 16: ... Figure 2 shows scatterplots of posterior inclusion probabilities generated by the various priors against our baseline prior (Prior 1). Since Prior 1 was the most optimistic, with 22 candidate regressors showing an effect in Table2 , it is no surprise that most of the points in the scatterplots lie above the 45 degree line, indicating generally higher posterior inclusion probabilities for each regressor under Prior 1 as compared to other priors. More importantly, however, the scatterplots highlight not only that Prior 1 is more optimistic, but also how the differences between Prior 1 and alternative priors increase as the implied g-prior diverges.... In PAGE 16: ... Priors 1, 6, and 12 have relatively similar results, but most other priors show differing effects implied by the priors. Alternatively, one might be tempted to interpret Table2 as suggesting that 6 regressors (Confucius, Initial GDP, Life Expectancy, Rule of Law, Sub-Saharan Africa dummy, and Equipment Investment) are robustly related to growth, since there is clear evidence for an effect for each of these regressors across all priors. We view this interpretation as misguided because the selection criterion based on the lowest common denominator is inappropriately conservative.... In PAGE 19: ... This leads not only to fewer regressors that surpass the effect-threshold, but also to a different set of effective regressors. The restrictive model prior has the least impact on Prior 11; for this prior, the Rule of Law variable loses significance but otherwise the results are identical to Table2 . Thus forcing BMA to increase the weight on smaller models and penalize larger models affects priors differently: it can change the number of candidate regressors that pass the effect-threshold, and it can lead to different regressors with high inclusion probabilities.... ..."