### Table 4: Results on Datasets with Complete Bias

"... In PAGE 9: ... This is different from bagging where multiple bootstraps are used to generate multiple models using the same inductive learner, and then multiple models are combined by voting. The results are summarized in Table4 . Comparing with Tables 2 and 3, the effect of complete bias on reduction of accuracy is between feature bias and class bias.... ..."

### Table 6. Win/Draw/Loss Records of Bias on 36 Datasets

2005

Cited by 5

### Table 4. Bias and Variance Decomposition of the Error for Synthesized Data Dataset

"... In PAGE 6: ... This follows from Equation 10 and because all the addenda are pos- itive. Moreover, the nearest neighbor classi er is known to have high bias [Breiman, 1996b], as the decomposition of the error on arti cial data sets shows (see Table4 ). Therefore, it is unlikely that the true bias on real data sets (i.... In PAGE 9: ...re seven boolean features and ten classes. The Bayes optimal error rate is 0.274. Table4 shows the decomposition of the error for the same set of algorithms that were applied to the archival data sets. The behavior of IB1ecoc on the synthetic data sets is analogous to what we observed on the archival data; IB1ecoc recorded lower error rates than IB1 by reducing bias and increasing variance.... ..."

### Table 3. Bias and Variance Decomposition of the Error for Archived Data Dataset

"... In PAGE 6: ...e., that obtained from the estimate by subtracting a part of the Bayes error rate (see Equation 8)) could greatly di er from that estimated, which is high (see Table3 ) or be comparable in size with the variance. Second, in the experiments performed on synthesized data sets, where the Bayes error rate is known and the bias/variance decomposition is exactly computable, we observed similar behaviors in the changes of bias and variance with respect to the nearest neighbor classi er.... In PAGE 6: ... 4.1 Archived Data Sets The results of the experiments conducted on archived data are shown in Table3 . IB1 is an implementation of the nearest neighbor classi er [Aha, 1992]) and is used as a baseline.... In PAGE 7: ... Only on CLOUDS98 did IB1ecoc yield higher errors than IB1opc, because the class encoding space de ned by only four classes is too small to generate su ciently distinctive codewords.3 gt;From the results, shown in Table3 , we conclude that ECOCs drastically reduce the bias component of the error at the cost of increasing the variance. Bias is always reduced from a minimum of 18% (CLOUDS99) to a maximum of 44% (CLOUDS204).... In PAGE 8: ...e., compare CLOUDS204-100 in Table3 (100 training instances) with CLOUDS204 (500)), this did not decrease IB1ecoc apos;s variance. However, increasing codeword length does appear to reduce IB1ecoc apos;s variance, as exempli ed in Figure 1, which plots the bias and total error of IB1ecoc on CLOUDS204 (using only 100 instances) with di erent codeword lengths.... ..."

### Table 1. Generalized Bias (for All Parameters) with Independent Observations (Dataset 1)

"... In PAGE 10: ... From this perspective, this measure serves as a generalized bias. Table1 shows this generalized estimate of bias with independent observations (i.e.... ..."

### Table 4.1: Description of the three benchmark datasets. The datasets are prepared in the following procedure. First, a test dataset of the specifled size (rightmost column of the table) was reserved from each benchmark dataset. Then, training datasets of varying sizes were sampled without replacement from the remaining data examples. To minimize the sampling bias, ten training datasets of the same size were generated for each benchmark problem.

### Table 1: Properties of datasets used in experiments

2005

"... In PAGE 9: ...2 Datasets Used To objectively evaluate our approach against EA and KMeans+Gap we used many datasets with widely different properties to reduce any bias of dataset selection. Some of these properties are listed in Table1 . WBC stands for the Wisconsin Breast Cancer dataset.... ..."

Cited by 3

### Table 2 gives some basic profiling statistics, which show some of the bias in the datasets. DOE, for example, appears relatively uniform regarding text length, whereas FR shows the largest range. Comparing the ratio of new to old words gives an indication of domain diver- sity. There is a significant difference between the rate of new terms occurring, between the OU dataset (1 in 131 words) and the SJM dataset (1 in 260 words), in spite of their similar size. The WSJ and SJM sets are quite close in size and characteristics as well as in genre type, so we would expect them to behave in similar ways. Note also that the 10 most frequent terms of all TIPSTER collections are function words, but not in the OU dataset (Table 3).

"... In PAGE 6: ... Table2 . Basic profiling statistics of each of the datasets.... ..."

### Table 1: Results of bias-variance experiments using boosting and bagging on five synthetic datasets (described in Appendix B). For each dataset and each learning method, we estimated bias, variance and generalization error rate, reported in percent, using two sets of definitions for bias and variance (given in Appendix C). Both C4.5 and decision stumps were used as base learning algorithms. For stumps, we used both error-based and pseudoloss-based versions of boosting and bagging on problems with more than two classes. Columns labeled with a dash indicate that the base learning algorithm was run by itself.

1998

"... In PAGE 19: ... For two-class problems, only the error-based versions were used. The results are summarized in Table1 . Clearly, boosting is doing more than reducing variance.... ..."

Cited by 519

### Table 1: Results of bias-variance experiments using boosting and bagging on five synthetic datasets (described in Appendix B). For each dataset and each learning method, we estimated bias, variance and generalization error rate, reported in percent, using two sets of definitions for bias and variance (given in Appendix C). Both C4.5 and decision stumps were used as base learning algorithms. For stumps, we used both error-based and pseudoloss-based versions of boosting and bagging on problems with more than two classes. Columns labeled with a dash indicate that the base learning algorithm was run by itself.

1998

"... In PAGE 19: ... For two-class problems, only the error-based versions were used. The results are summarized in Table1 . Clearly, boosting is doing more than reducing variance.... ..."

Cited by 519