## Hyperkernels (2002)

Citations: | 14 - 0 self |

### BibTeX

@MISC{Ong02hyperkernels,

author = {Cheng Soon Ong and Alexander J. Smola and Robert C. Williamson},

title = {Hyperkernels},

year = {2002}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider the problem of choosing a kernel suitable for estimation using a Gaussian Process estimator or a Support Vector Machine. A novel solution is presented which involves defining a Reproducing Kernel Hilbert Space on the space of kernels itself. By utilizing an analog of the classical representer theorem, the problem of choosing a kernel from a parameterized family of kernels (e.g. of varying width) is reduced to a statistical estimation problem akin to the problem of minimizing a regularized risk functional. Various classical settings for model or kernel selection are special cases of our framework.

### Citations

2265 | Learning With Kernels
- Sholkopf, Smola
- 2002
(Show Context)
Citation Context ...e the form � � �s¡¤£¤¥�§ © � ¡¤£¦¥¨§ © ����� � � £¦��� � ¤ � ¥ ����� � � ��� � � � � � ���§¦©¨��� � ��� � where � � � � � � � £ � is the RKHS norm of . By virtue of the representer theorem (see e.g., =-=[4, 8]-=-) we know that the minimizer over of (5) can be written as a kernel expansion. For a given loss � this leads to the quality functional � ��� £¦���¨£¦§ � � ����� � � © ¡¤£¦¥¨§s¡�£¦¥�§ © � ��������� ���... |

1797 | Schapire: “Experiments with a New Boosting Algorithm
- Freund, Robert
- 1996
(Show Context)
Citation Context ...en. Using the same non-optimized parameters for different data sets we achieved results comparable to other recent work on classification such as boosting, optimized SVMs, and kernel target alignment =-=[10, 11, 7]-=- (note that we use a much smaller part of the data for training: ¢ � ¢sData(size) Train Test Train Test [10, 11] SVM pima(768) 25.2s2.0 26.2s3.3 22.2s1.4 23.2s2.0 23.5 22.9s2.0 ionosph(351) 13.4s2.0 1... |

588 | Learning the kernel matrix with semidefinite programming
- Lanckriet, Cristianini, et al.
(Show Context)
Citation Context ...ques to assist in this choice. Even the restricted problem of choosing the “width” of a parameterized family of kernels (e.g. Gaussian) has not had a simple and elegant solution. A recent development =-=[1]-=- which solves the above problem in a restricted sense involves the use of semidefinite programming to learn an arbitrary positive semidefinite matrix ¢ , subject to minimization of criteria such as th... |

277 | Soft margins for AdaBoost
- Rätsch, Onoda, et al.
- 2001
(Show Context)
Citation Context ...en. Using the same non-optimized parameters for different data sets we achieved results comparable to other recent work on classification such as boosting, optimized SVMs, and kernel target alignment =-=[10, 11, 7]-=- (note that we use a much smaller part of the data for training: ¢ � ¢sData(size) Train Test Train Test [10, 11] SVM pima(768) 25.2s2.0 26.2s3.3 22.2s1.4 23.2s2.0 23.5 22.9s2.0 ionosph(351) 13.4s2.0 1... |

206 | Prediction with Gaussian processes: From linear regression to linear prediction and beyond
- Williams
- 1999
(Show Context)
Citation Context ...f semidefinite programming to learn an arbitrary positive semidefinite matrix ¢ , subject to minimization of criteria such as the kernel target alignment [1], the maximum of the posterior probability =-=[2]-=-, the minimization of a learning-theoretical bound [3], or subject to cross-validation settings [4]. The restriction mentioned is that the methods work with the kernel matrix, rather than the kernel i... |

199 | Efficient SVM training using low-rank kernel representations - Fine, Scheinberg - 2001 |

153 |
Spline Models for Observational Data, volume 59
- Wahba
- 1990
(Show Context)
Citation Context ...zation of criteria such as the kernel target alignment [1], the maximum of the posterior probability [2], the minimization of a learning-theoretical bound [3], or subject to cross-validation settings =-=[4]-=-. The restriction mentioned is that the methods work with the kernel matrix, rather than the kernel itself. Furthermore, whilst demonstrably improving the performance of estimators to some degree, the... |

73 | On the complexity of learning the kernel matrix
- Bousquet, Herrmann
- 2003
(Show Context)
Citation Context ...n presented is for optimizing kernels themselves, rather than the kernel matrix as in [1]. Other approaches on learning the kernel include using boosting [5] and by bounding the Rademacher complexity =-=[6]-=-.sOutline of the Paper We show (Section 2) that for most kernel-based learning methods there exists a functional, the quality functional 1 , which plays a similar role to the empirical risk functional... |

67 | Kernel Design Using Boosting
- Crammer, Keshet, et al.
(Show Context)
Citation Context ...ample size available. Furthermore, the solution presented is for optimizing kernels themselves, rather than the kernel matrix as in [1]. Other approaches on learning the kernel include using boosting =-=[5]-=- and by bounding the Rademacher complexity [6].sOutline of the Paper We show (Section 2) that for most kernel-based learning methods there exists a functional, the quality functional 1 , which plays a... |

31 |
Choosing kernel parameters for support vector machines
- Chapelle, Vapnik, et al.
- 2002
(Show Context)
Citation Context ...ive semidefinite matrix ¢ , subject to minimization of criteria such as the kernel target alignment [1], the maximum of the posterior probability [2], the minimization of a learning-theoretical bound =-=[3]-=-, or subject to cross-validation settings [4]. The restriction mentioned is that the methods work with the kernel matrix, rather than the kernel itself. Furthermore, whilst demonstrably improving the ... |

30 |
On the extensions of kernel alignment
- Kandola, Shawe-Taylor, et al.
- 2002
(Show Context)
Citation Context ... � � � � � �s����� � ��������� � ������� is known as the expected risk. We now present some examples of quality func��� � Example 1 (Kernel Target Alignment) This quality functional was introduced in =-=[7]-=- to assess the “alignment” of a kernel with training labels. It is defined by § �¨©�����©�¡ ¥¨� � � ����� � � © � ¡¤£¦¥¨§ © ��������� ¡¤£¦¥¨§s������� � � � � where � denotes the vector of � ¡¤£¤¥�§ © ... |