### Table 5: The average number of putative F hypotheses that were required per pair of images. Compared to the vanilla and hypergeometric methods, the Td,d method requires many more hypothesis samples to be generated before the global termination criterion (eqn. 4) is met. This significantly increases the computational overhead associated with using the Td,d test.

### Table 3 Tests to reach criterion for test sets

"... In PAGE 11: ... One measure of learning facility is how quickly par- ticipants passed the criterion of a perfect score on one test. As seen in Table3 , the mean number of tests to reach criterion was smaller for experts than for novices on all four test sets. The difference between experts and novices was not significant for any one test set alone, but was significant when all four sets were considered together, t(90) 2.... ..."

### Table 3. Number of Test Cases Per Criterion.

"... In PAGE 7: ...1. Test Set Size Table3 gives the number of test cases generated for each criterion. In the cruise control software, specification muta-... ..."

### Table 1. Criterion c2 test requirements.

"... In PAGE 4: ... Table1 shows the test requirements of criterion c2 for the billing system use case diagram, which presents the use case 11 being extended by three other use cases, namely, use cases 12, 13 and 14. In the first row of Table 1, the test requirement r1 means that the extend relationships 12-11, 13-11 and 14-11 should be exercised.... In PAGE 6: ... Regarding the criteria based on the combination of the extend relationships, all-extended- combinations criterion (c2) was the only one not satisfied by the tests. Table1 indicates the test requirements of criterion c2 not exercised (symbol X in the last column) and exercised (symbol g165 in the last column) during the simulation. Two non-exercised test requirements r1 and r2 are infeasible; i.... ..."

### Table 3: Tests to reach criterion for test sets. N N Mean Tests to Criterion

"... In PAGE 8: ... One measure of learning facilityishow quickly partic- ipants passed the criterion of a perfect score on one test. As seen in Table3 , the mean number of tests to reach criterion was smaller for experts than for novices on all four test sets. The di#0Berence between experts and novices was not signi#0Ccant for any one test set alone, but was signi#0Ccant when all four sets were considered together, t#2890#29 = 2:00.... ..."

### Table 1. GV-TEST criterion

### Table 2. Calculation of the logodds criterion

"... In PAGE 6: ... Let S denote the set of all available diagnostic tests. Our proposed criterion is maxT S v(T) = maxT S X Ti2T v(Ti): For the example data shown above the logodds criterion values can be cal- culated as shown in Table2 . Using this criterion the best test is C, followed by A then B.... ..."

### Table 7. Possible outcomes from verification test and the verification test criterions.

### Table 2: Augmented Dickey Fuller Tests for level series and rst di erences. Se- lection of lag order k = 0; ::; 5 order selection and of deterministic terms by means of BIC criterion. Critical values from own simulations using 10000 replications of a random walk with N(0,1) innovations. TR = 1 indicates a deterministic trend within the ADF-regression. a; b; c indicate signi cance at the 1%, 5%, and 10% level respectively. The following industrial economies are considered: Belgium (BE), Canada (CA), Switzerland (CH), Denmark (DK), Finland (FL), France (FR), Ger- many (GE), Greece (GR), Ireland (IR), Italy (IT), Japan (JA), Netherlands (NL), Norway (NO), Portugal (PO), Sweden (SE), Spain (SP), United Kingdom (UK), and United States (US).

"... In PAGE 13: ...Unit root tests Results from Augmented Dickey Fuller tests applied to the series of nominal exchange rates and price levels are given in Table2 . The test regressions are alternatively speci ed with intercept term and both intercept term and deterministic trend.... ..."