### Table 1: Test Error Rates on the USPS Handwritten Digit Database.

"... In PAGE 12: ... It simply tries to separate the training data by a hyperplane with large margin. Table1 illustrates two advantages of using nonlinear kernels. First, per- formance of a linear classifier trained on nonlinear principal components is better than for the same number of linear components; second, the perfor- mance for nonlinear components can be further improved by using more components than is possible in the linear case.... ..."

### Table 1: Test error rates on the MPI chair database for linear Support Vector machines trained on nonlinear

1998

"... In PAGE 8: ... To assess the utility of the components, we trained a linear Support Vector classi er (Vapnik amp; Chervonenkis, 1979;; Cortes amp; Vapnik, 1995) on the classi cation task. 7 Table1 summarizes our ndings: in all cases, nonlinear components as ex- tracted by polynomial kernels (cf. Eq.... ..."

Cited by 567

### Table 2: Test error rates on the USPS handwritten digit database for linear Support Vector machines trained

1998

"... In PAGE 10: ....g. local translation invariance, be it b y generat- ing \virtual quot; translated examples or bychoosing a suitable kernel, could further improve the results. Table2 nicely illustrates two advantages of us- ing nonlinear kernels: rst, performance for non- linear principal components is better than for the same number of linear components;; second, the performance for nonlinear components can be fur- ther improved by using more components than possible in the linear case. 9 We conclude this section with a commenton the approach taken.... ..."

Cited by 567

### Table 2: Linear programming training and testing set correctness for linear and nonlinear support vector machines 6 Conclusion

1999

"... In PAGE 9: ...70% 93.41% Tenfold Testing Correctness Tenfold Training Correctness Table 1: SOR training and testing set correctness for linear and quadratic kernels Table2 shows training and testing set correctness, using the linear pro- gramming formulation (17) with various kernels, under tenfold cross vali- dation for the above mentioned datasets. We implemented a linear kernel, a quadratic kernel, a symmetric sixth degree polynomial, and a symmetric sinusoidal kernel based on the following formulations which in general are inde nite kernels: Kernel 5.... ..."

Cited by 15

### Table 4.2: Linear programming training and test set correctness for linear and nonlinear support vector machines

### Table 1: Test set accuracy of the Probabilistic Random Forest (PRF) compared to the linear Minimax Probability Machine (MPMCL), the Gaussian MPMC (MPMCG), the linear Support Vector Machine (SVML), the Gaussian SVM (SVMG) (results published in [4]) and Random Forests (RF) (results published in [5]).

### Table 5. Performance Comparison Table on Text Datasets. NB stands for Naive Bayes, LDA for Linear Discriminant Analysis, SVM for Support Vector Machine.

"... In PAGE 17: ... For SVM, we used the linear kernel. Table5 gives performances comparisons. SVM achieves the highest perfor- mance on WebKB4, WebKB, Reuters-2, K-dataset and TDT2.... ..."

### Table 1: Test error rates on the USPS handwritten digit database for linear Support Vector machines trained on nonlinear principal components extracted by PCA with kernel (20), for degrees 1 through 7. In the case of degree 1, we are doing standard PCA, with the number of nonzero eigenvalues being at most the dimensionality of the space, 256. Clearly, nonlinear principal components a ord test error rates which are superior to the linear case (degree 1).

1998

"... In PAGE 11: ... It simply tries to separate the training data by a hyperplane with large margin. Table1 illustrates two advantages of using nonlinear kernels: rst, performance of a linear classi er trained on nonlinear principal components is better than for the same number of linear components; second, the performance for nonlinear components can be further improved by using more components than possible in the linear case. The 4Thanks to L.... ..."

Cited by 567

### Table 5: Importance of bands in EQ16 when classified with a linear support vector machine. The rank is with respect to all 73 features used (57 core features and 16 bands), with 1 being the most important feature. Mean |w| and max |w| are both possible measures of importance; see Section 7.2 for details.

2006

"... In PAGE 6: ... The two measures are complementary; the first rewards dimensions that are moderately useful across all binary problems while the second rewards dimensions that are extremely useful for some binary problem. Table5 shows the values of these two importance measures, averaged across each of the 10 folds, for each band. The bands with the most information are 1500-2000, 2500-3000, 3500-4000, 3000-3500, and 500-1000.... ..."

### Table 1: Test error rates on the MPI chair database for linear Support Vector machines trained on nonlinear principal components extracted by PCA with kernel (22), for degrees 1 through 7. In the case of degree 1, we are doing standard PCA, with the number of nonzero Eigenvalues being at most the dimensionality of the space, 256; thus, we can extract at most 256 principal components. The performance for the nonlinear cases (degree gt; 1) is signi cantly better than for the linear case, illustrating the utility of the extracted nonlinear components for classi cation.

"... In PAGE 8: ... To assess the utility of the components, we trained a linear Support Vector classi er (Vapnik amp; Chervonenkis, 1979; Cortes amp; Vapnik, 1995) on the classi cation task.7 Table1 summarizes our ndings: in all cases, nonlinear components as ex- tracted by polynomial kernels (cf. Eq.... ..."