| Herbrich, R., Graepel, T., & Campbell, C. (1999). Bayes point machine: Estimating the Bayes point in kernel space. IJCAI Workshop SVMs (pp. 23--27). |
....N is the number of examples and M the number of mistakes. Finding the maximum margin hyperplane is estimated at O(N 2:5 ) time. For the computation of VolEst we need to spend O(N ) per bounce of the billiard. In our implementations (we used SVM Light [Joa99] and the billiard algorithm of [Ruj97, RM00, HGC99]) VolEst was clearly the slowest. For much larger data sets VoPerc seems to be the simplest and the most adaptable. 0 5 10 15 20 25 30 35 40 number of hits VoPerc SVM VolEst 0 0.2 0.4 0.6 0.8 1 0 50 100 150 number of hits VoPerc SVM VolEst Figure 6: Total hit performance on ....
R. Herbrich, T. Graepel, and C. Campbell. Bayes point machines: Estimating the bayes point in kernel space. pages 23--27, 1999.
....cases of the MRED variant detailed in section 5. However, exploring these relations in detail is messy and not within the scope of this paper. See [2] for an analysis of a Bayesian classi er from the viewpoint of the Luckiness framework. Another approach to PAC Bayesian learning is taken in [8] [7], 17] It uses geometrical ideas within the version space, i.e. the subset of discriminants which are consistent with a given training sample, and therefore results in bounds for the case where the nally chosen discriminant classi es all training points correctly. The idea to focus on volumes ....
....theory (see [3] and Vapnik s work. Noting that hard margin SVMs nd the largest ball within version space and choose the centre of this ball as discriminant, Shawe Taylor and Williamson [17] focused new interest on this idea, which has since then found powerful extensions in Herbrich et al. [8] [7]. The relations between the bounds in [7] and ours can be clari ed by noting that Herbrich et al. [7] actually apply an earlier result of McAllester [12] to the version space framework. This results in conceptually simpler bounds than applying results derived within the Luckiness framework [16] ....
[Article contains additional citation context not shown here]
Ralf Herbrich, Thore Graepel, and Colin Campbell. Bayes point machines: Estimating the Bayes point in kernel space. In Proceedings of IJCAI 99, pages 23-27, 1999.
....is the number of labeled examples and M the number of mistakes. Finding the maximum margin hyperplane is estimated at O(N 2:5 ) time. For the computation of VolEst we need to spend O(N ) per bounce of the billiard. In our implementations we used SVM Light [Joa99] and the billiard algorithm of [Ruj97, RM00, HGC99]. If we have the internal hypothesis of the algorithm then for applying the selection criterion we need to evaluate the hypothesis for each unlabeled point. This cost is proportional to the number of support vectors for the SVM based methods and proportional to the number of mistakes for the ....
Ralf Herbrich, Thore Graepel, and Colin Campbell. Bayes point machines: Estimating the bayes point in kernel space. In Proceedings of IJCAI Workshop Support Vector Machines, pages 23--27, 1999.
....patterns during episodes likely to be frustrating to the user. By modeling user identity as hidden context, this algorithm achieves on average 10.6 userindependent test error rate. 1 Introduction By approximating the Bayesian average, Bayes Classifiers achieve good generalization performance [3, 6]. A Bayesian linear classifier can be easily converted to a nonlinear classifier by using feature expansions or kernel methods as does the Support Vector Machine (SVM) On the other hand, prediction for real world applications is often complicated by some changing context. For example, a person ....
R. Herbrich, T. Graepel, and C. Campbell. Bayes point machine: Estimating the Bayes point in kernel space. In IJCAI Workshop SVMs, pages 23--27, 1999.
....of Gibbs algorithm towards Bayesian inference, rate of convergence of the empirical loss towards the generalization loss, convergence of the generalization error towards the optimal loss in the underlying class of functions. 1 Introduction Recent works about the Bayes Point Machine ([5, 6]) reporting good results on a few benchmarks, have put into relief the interest of old well known learning paradigms such as Bayesian inference. Haussler, Kearns and Shapire [4] give results upon Bayesian learning in the zero error case, and Devroye, Gyor and Lugosi [3] recall negative results ....
R. Herbrich, T. Graepel, C. Campbell, Bayes Point Machines: Estimating the Bayes Point in Kernel Space, in Proceedings of IJCAI Workshop on Support Vector Machines, pages 23-27, 1999
....data set being linearly separable. Second, as noted by Shawe Taylor and Cristianini (1999) it is possible to modify any kernel so that the data in the new induced feature space is linearly separable 1 . There exists a duality between the feature space F and the parameter space W (Vapnik, 1998; Herbrich et al. 1999) which we shall take advantage of in the next section: points in F correspond to hyperplanes in W and vice versa. Clearly, by definition points in W correspond to hyperplanes in F . The intuition behind the converse is that observing a training instance x i in the feature space restricts the set ....
....query b. ffl MaxMin Margin. The Simple Margin method can be a rather rough approximation. It relies on the assumption that the version space is fairly symmetric and that w i is centrally placed. It has been demonstrated, both in theory and practice, that these assumptions can fail significantly (Herbrich et al. 1999). Indeed, if we are not careful we may actually query an instance whose hyperplane does not even intersect the version space. The MaxMin approximation is designed to somewhat overcome these problems. Given some data fx 1 : x i g and labels fy 1 : y i g the SVM unit vector w i is the center ....
[Article contains additional citation context not shown here]
Herbrich, R., T. Graepel, and C. Campbell: 1999, `Bayes Point Machines: Estimating the Bayes Point in Kernel Space'. In: International Joint Conference on Artificial Intelligence Workshop on Support Vector Machines. pp. 23--27.
....k against all the others, and x was classi ed in class k such that f k (x) is maximum. Guermeur et al. in [9] proposes another version of multiclass SVM, mathematically justi ed but more computationally expensive, sometimes leading to better results. 2. 2 Bayes Point Machines Herbrich et al. [6] consider that SVM work mainly as approximators of a Bayesian algorithm de ned by Rujan [11] Their experiments suggest that, in the separable case, the algorithm called Bayes Point Machine, is more e cient than SVM. The principle of the algorithm is the following one: Let K be a kernel. x 7 ....
....following one: Assuming that all vectors have norm 1 (for example K(x; y) exp kx yk 2 2 ) one can see that hard margin SVM (i.e. minimization of w 2 under constraint i = 0) choose the w in the version space which is the centre of the maximum inscribed sphere. Herbrich et al. [6] explains that such point is an approximation of the average of the version space, and that when the version space has unregular shape the approximation is not e cient. In these cases, their algorithm gives better results than SVM. The extension to the multiclass case can be done in a ....
[Article contains additional citation context not shown here]
Herbrich R., T. Graepel, and C. Campbell 1999, Bayes Point Machines: Estimating the Bayes Point in Kernel Space, in Proceedings of IJCAI Workshop Support Vector Machines, pages 23-27, 1999
....1. Introduction Averaging is a standard technique in applied machine learning for combining multiple classi ers to achieve greater accuracy. Examples include Bayesian classi cation [4] boosting [7] bagging [2] Winnow [13] Maximum Entropy discrimination [11] and Bayes point machines [9]. Despite the prevalence of this technique there is only weak theoretical justi cation so far for the practice. This paper provides a new stronger theoretical justi cation for the practice of averaging. In particular, we state and prove a bound on the gap between the training set error rate and ....
....Q) so as to approximately minimize the bound we have derived above. We can choose N = 4 2 ln m D(QjjP ) 1 : 4. Implications We wish to apply the preceding theory to two general learning methods: Maximum Entropy discrimination[11] and Bayes as well as Bayes Point Classi ers [15] [9]. We choose these two learning methods because the average in these cases is over many hypotheses, so that the low order terms in the bound are not very signi cant. We begin with a simple toy example that illustrates the bound application. 4.1 Example A quick example will illustrate the ....
Ralf Herbrich, Thore Grapel, and Colin Campbell, \Bayes Point Machines: Estimating the Bayes Point in Kernel Space", IJCAI 1999 pages 23-29.
....is less harsh than it at first may seem since the feature space often has a very high dimension. Nevertheless, requiring linear separability in feature space is a condition we wish to relax in future work. There exists a duality between the feature space F and the parameter space W (Vapnik, 1998; Herbrich et al. 1999) which we shall take advantage of in the next section: points in F correspond to hyperplanes in W and vice versa. Clearly, by definition points in W correspond to hyperplanes in F . The intuition behind the converse is that observing a training instance x i in feature space restricts the set of ....
....and so we will choose to query b. ffl MaxMin Margin. The Simple Margin method can be a rather rough approximation. It relies on version space being fairly symmetric and w i being centrally placed. It has been demonstrated, both in theory and practice, that these assumptions can fail significantly (Herbrich et al. 1999). Indeed, if we are not careful we may actually query an instance whose hyperplane does not even intersect the version space. The MaxMin approximation is designed to somewhat overcome these problems. Given some data fx 1 : x i g and labels fy 1 : y i g the SVM unit vector w i is the center ....
Herbrich, R., Graepel, T., & Campbell, C. (1999). Bayes point machines: Estimating the bayes point in kernel space. International Joint Conference on Artificial Intelligence Workshop on Support Vector Machines (pp. 23--27).
....we propose new versions of Bayes Point Machines. Contents 1 Introduction 2 2 Bayesian approach 2 3 Learning with or without a priori on the distribution 4 4 Algorithm 1:simplified error tolerant BPM 8 5 Algorithm 2 : the shortest bayesian learning algorithm 9 6 Conclusion 10 1 1 Introduction [6] explains that SVMs are an approximation of a Bayesian classifier, and show that their algorithm Bayes Point Machines (BPM) is a better approximation of this Bayesian classifier in the case where the version space (ie the space of consistent hypothesis in the feature space) is much larger than the ....
.... = Z P (Djw)11 wx 0 dP(w) Z Pi i p cardfi=y i wx i 0g q cardfi=y i wx i g z denoted by Pi now 11 wx 0 dP(w) So, being given D, P (wx 0) 1 2 if and only if R Pi11 wx 0 dP(w) 1 2 R PidP(w) This can be approximated, when classifieurs can be easily summed, as in [6], by ( R PiwdP(w) x) 0; this is the bayesian averaged classifier (with consequences detailed below) Proposition 1 (Quality of bayesian approximators) The bayesian averaged classifier can be a bad approximation of the bayesian classifier, and the bayesian classifier can be a bad ....
[Article contains additional citation context not shown here]
R. Herbrich, T. Graepel, C. Campbell, Bayes Point Machines: Estimating the Bayes Point in Kernel Space, in Proceedings of IJCAI Workshop Support Vector Machines, pages 23-27, 1999
....p. 157] Since for large datasets the SVM algorithm is too time consuming many heuristics to approximate the SVM solution have been put forward (see, e.g. 2] Recently, it has been demonstrated experimentally that even algorithms with no explicit regularisation performs comparably to SVMs (see [9, 5]) This observation raises an interesting question: ffl How many classifiers within version space exhibit a small generalisation error In this paper we try to answer this question both from a theoretical and experimental point of view. Using a recent result in the PAC Bayesian framework we are ....
.... Figure 1: Left) Box plots of distributions of generalisation errors for n = 1000 samples w using different degrees in the polynomial kernel (9) The 4, Theta and ffi depict the generalisation errors of the SVM solution, the SVM solution when normalising in feature space K and the BPM solution [5], respectively. Right) Box plots of distributions of generalisation for different attained margins (7) when using a polynomial kernel of degree 5. The width of each box plot is proportional to the number of samples on which it is based. space misclassify x, too. By this argument, the ....
[Article contains additional citation context not shown here]
R. Herbrich, T. Graepel, and C. Campbell. Bayes point machines: Estimating the Bayes point in kernel space. In Proceedings of IJCAI Workshop Support Vector Machines, pages 23--27, 1999.
No context found.
Herbrich, R., Graepel, T., & Campbell, C. (1999). Bayes point machine: Estimating the Bayes point in kernel space. IJCAI Workshop SVMs (pp. 23--27).
No context found.
Herbrich, R., Graepel, T. and Campbell, C.: Bayes Point Machines: Estimating the Bayes Point in Kernel Space. In Proceedings of International Joint Conference on Artificial Intelligence Work-shop on Support Vector Machines, (1999) 23-27
No context found.
R. Herbrich, T. Graepel, and C. Campbell. Bayes point machines: Estimating the bayes point in kernel space. In International Joint Conference on Artificial Intelligence Workshop on Support Vector Machines, pages 23--27, 1999.
No context found.
Ralf Herbrich, Thore Graepel, and Colin Campbell. Bayes point machines: Estimating the Bayes point in kernel space. In Proceedings of IJCAI Workshop Support Vector Machines, pages 23--27, 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC