12 citations found. Retrieving documents...
O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing kernel parameters for support vector machines. Machine Learning, 2002. Forthcoming.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
New Results on Error Correcting Output Codes of Kernel.. - Passerini, Pontil.. (2003)   (Correct)

....very small set of hyperparameters. In this case, it can be carried out by calling as a subroutine a learning algorithm that receives hyperparameters as constant input arguments. Recent methods for tuning several hyperparameters simultaneously include gradient descent [16] and sensitivity analysis [17]. The former method consist in choosing a differentiable model selection criterion and searching a global optimum in the joint space of parameters and hyperparameters. The latter works by iteratively minimizing an estimate of the generalization error of the support vector machine. The method ....

....Unfortunately, computing the LOO error is time demanding when # is large. This becomes practically impossible in the case that we need to know the LOO error for several values of the parameters of the machine used. In the case of binary SVMs, bounds on the leave one out error were studied in [17] see also [29] and references therein. In the following theorem we give a bound on the LOO error of ECOC of kernel machines. An interesting feature is that the bound only depends on the solution of the machines trained on the full data set (so training the machines once will suffice) Below we ....

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, "Choosing kernel parameters for support vector machines, " Machine Learning, vol. 46, no. 1--3, pp. 131--159, 2002.


Learning to Predict the Leave-one-out Error of Kernel.. - Tsuda, Rätsch, Mika..   (Correct)

....,g for the constraints in (1) which are zero if the constraint is not active. For SVMs they turn out to be equal to ai. For LPM there is no such correspondence. We will now review some bounds on the LOO error of SVMs that have been proposed. A more complete presentation can be found in e.g. [2,8]. Let Z be a sample of size g, where each pattern is defined as zi = xi, Yi) Furthermore, define Z p = z i Z,i p and fP : ZP) i.e. fP is the decision obtained when learning with the p th sample left out of the training set. The LOO error is defined as I l(Z) y P( ypfP(xp) 2) p=l ....

O. Chapelle and V.N. Vapnik. Choosing kernel parameters for support vector machines. Personal communication, mar 2000.


Incremental Support Vector Machine Learning: a Local.. - Ralaivola.. (2001)   (4 citations)  (Correct)

....however not realistic to be provided with a test set during the incremental learning process. So this solution cannot be considered as a good answer to our problem. Elsewhere, there exist some analytical expression of Leave One Out estimates of SVMs generalization error such as those recalled in [5]. However, in order to use these estimates, one has to ensure that the margin optimization problem has been solved exactly. The same holds for Joachims e y estimators [10, 11] This restriction prevents us from using these estimates as we only do a partial local optimization. To circumvent the ....

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing kernel parameters for support vector machines. Technical report, AT&T Labs, March 2000.


Incremental Learning Algorithms for Classification and.. - d'Alché-Buc, Ralaivola   (Correct)

....set and thus keeping n that mnimizes this estimation e Val . However it does not seem to be realistic to get a validation set during the incremental learning process. Elsewhere, there exist some analytical expression of Leave One Out estimates of SVMs generalization error such as those recalled in [6]. However, in order to use these estimates, one has to ensure that the margin optimization problem has been solved exactly. The same holds for the xa estimators of Joachims [12, 14] This restriction prevents from using these estimates as we only do a partial local optimization. To circumvent the ....

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing kernel parameters for support vector machines. Technical report, AT&T Labs, March 2000.


Meta Learning: Learning to Predict the Leave-one-out Error - Tsuda, Rätsch, Mika, Müller   (Correct)

....sorts of different classification problems. Note that we are using features that are meant to reflect the local difficulty of this meta learning problem across a large set of possible data. Once given a good LOO error estimate, we can use it for choosing hyperparameters, e.g. for selecting models [1]. As we have incorporated knowledge about a variety of learning problems, our approach loosely reminds us of a human expert, who can learn from his previous experience about the choice of hyperparameters for future model selection. After reviewing some popular learning theoretical LOO error ....

....using our empirical LOO error approximation and e.g. the span bound and finally conclude with some remarks. 2 Reviewing Selected Bounds for SVM In this section we will review some bounds on the LOO error of SVMs that have been proposed. A more complete presentation can be found in e.g. [1]. Let Z be a sample of size , where each pattern is defined as z i = x i ; y i ) x i 2 R n ; y 2 f1; 1g. Furthermore, define Z p = fz i 2 Z; i 6= pg and f = L(Z p ; i.e. f is the decision obtained when learning with the p th sample left out of the training set. The LOO error is ....

[Article contains additional citation context not shown here]

O. Chapelle and V. Vapnik. Choosing kernel parameters for support vector machines. Personal communication, mar 2000.


Learning to Predict the Leave-one-out Error of Kernel.. - Tsuda, Rätsch, Mika..   (Correct)

....,g for the constraints in (1) which are zero if the constraint is not active. For SVMs they turn out to be equal to (i. For LPM there is no such correspondence. We will now review some bounds on the LOO error of SVMs that have been proposed. A more complete presentation can be found in e.g. [2,8]. Let Z be a sample of size g, where each pattern is defined as zi = xi, Yi) Furthermore, define Z p = z i Z,i p and fP : ZP) i.e. fP is the decision obtained when learning with the p th sample left out of the training set. The LOO error is defined as l(Z) y P( ypfP(xp) 2) p=l ....

O. Chapelle and V.N. Vapnik. Choosing kernel parameters for support vector machines. Personal communication, mar 2000.


Feature Selection for SVMs - Weston, Mukherjee, Chapelle, Pontil, .. (2001)   (30 citations)  Self-citation (Chapelle Vapnik Mukherjee)   (Correct)

.... i;j=1 i j y i y j K(x i ; x j ) 4) under constraints P i=1 i y i = 0 and i 0; i = 1; For the non separable case one can quadratically penalize errors with the modified kernel K K I where I is the identity matrix and a constant penalizing the training errors (see [4] for reasons for this choice) Suppose that the size of the maximal margin is M and the images (x 1 ) x ) of the training vectors are within a sphere of radius R. Then the following holds true [13] Theorem 1 If images of training data of size belonging to a sphere of size R are ....

....is taken over sets of training data of size . This theorem justifies the idea that the performance depends on the ratio EfR =M g and not simply on the large margin M , where R is controlled by the mapping function ( Other bounds also exist, in particular Vapnik and Chapelle [4] derived an estimate using the concept of the span of support vectors. Theorem 2 Under the assumption that the set of support vectors does not change when removing the example p EP err p=1 (K SV ) pp (6) where is the step function, KSV is the matrix of dot products ....

[Article contains additional citation context not shown here]

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing kernel parameters for support vector machines. Machine Learning, 2000.


Feature Selection for SVMs - Weston, Mukherjee, Chapelle, Pontil, .. (2001)   (30 citations)  Self-citation (Chapelle Vapnik Mukherjee)   (Correct)

.... i;j=1 i j y i y j K(x i ; x j ) 4) under constraints P i=1 i y i = 0 and i 0; i = 1; For the non separable case one can quadratically penalize errors with the modified kernel K K 1 I where I is the identity matrix and a constant penalizing the training errors (see [4] for reasons for this choice) Suppose that the size of the maximal margin is M and the images (x 1 ) x ) of the training vectors are within a sphere of radius R. Then the following holds true [13] Theorem 1 If images of training data of size belonging to a sphere of size R are ....

....is taken over sets of training data of size . This theorem justifies the idea that the performance depends on the ratio EfR 2 =M 2 g and not simply on the large margin M , where R is controlled by the mapping function ( Other bounds also exist, in particular Vapnik and Chapelle [4] derived an estimate using the concept of the span of support vectors. Theorem 2 Under the assumption that the set of support vectors does not change when removing the example p EP 1 err 1 E X p=1 0 p (K 1 SV ) pp 1 (6) where is the step function, KSV is the matrix ....

[Article contains additional citation context not shown here]

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing kernel parameters for support vector machines. Machine Learning, 2000.


Feature Selection for SVMs - Weston, Mukherjee, Chapelle, Pontil, .. (2001)   (30 citations)  Self-citation (Chapelle Vapnik Mukherjee)   (Correct)

.... This reduces to maximizing the following optimization problem: W 2 (ff) X i=1 ff i Gamma 1 2 X i;j=1 ff i ff j y i y j K(x i ; x j ) 4) under constraints P i=1 ff i y i = 0 and ff i 0; i = 1; For the non separable case one can quadratically penalize errors [4] with the modified kernel K K 1 I where I is the identity matrix and a constant penalizing the training errors. Suppose that the size of the maximal margin is equal to M and that the images Phi(x 1 ) Phi(x ) of the training vectors x 1 ; x are within a sphere of radius R. ....

....is taken over sets of training data of size . This theorem justifies the idea that the performance depends on the ratio EfR 2 =M 2 g and not simply on the large margin M , where R is controlled by the mapping function Phi( Delta) Other bounds also exist, in particular Vapnik and Chapelle [4] derived an estimate using the concept of the span of support vectors. Theorem 2 Under the assumption that the set of support vectors does not change when removing the example p EP Gamma1 err 1 E X p=1 Psi ff 0 p (K Gamma1 SV ) pp Gamma 1 (6) where Psi is the ....

[Article contains additional citation context not shown here]

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing kernel parameters for support vector machines. Machine Learning, 2000. Submitted.


Feature Selection for SVMs - Weston, Mukherjee, Chapelle, Pontil, .. (2001)   (30 citations)  Self-citation (Chapelle Vapnik Mukherjee)   (Correct)

....is taken over sets of training data of size . This theorem justifies the idea that the performance depends on the ratio EfR 2 =M 2 g and not simply on the large margin M , where R is controlled by the mapping function Phi( Delta) Other bounds also exist, in particular Vapnik and Chapelle [4] derived an estimate using the concept of the span of support vectors. Theorem 2 Under the assumption that the set of support vectors does not change when removing the example p EP Gamma1 err 1 E X p=1 Psi ff 0 p (K Gamma1 SV )pp Gamma 1 (6) where Psi is the ....

....following method: approximate the binary valued vector oe 2 f0; 1g n ; with a real valued vector oe 2 R n . Find the optimum value of oe by minimizing R 2 W 2 , or some other criterion, by gradient descent. Denote the k th element of the n dimensional vector oe as oe k ) As explained in [4] the derivative of our criterion is: R 2 W 2 (oe; ff 0 ) oe k = R 2 (oe) W 2 (ff 0 ; oe) oe k W 2 (ff 0 ; oe) R 2 (oe) oe k : 10) We estimate the minimum of (oe; ff) by minimizing equation (8) in the space oe 2 R n using the gradients (10) with the following ....

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing kernel parameters for support vector machines. Machine Learning, 2000. Submitted.


Hyperkernels - Ong, Smola, Williamson   (Correct)

No context found.

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing kernel parameters for support vector machines. Machine Learning, 2002. Forthcoming.


A Pattern Search Method for Model Selection of - Support Vector Regression   (Correct)

No context found.

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, Choosing kernel parameters for support vector machines. AT&T Labs Technical report, 2000.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC