## Hyperkernels (2002)

Citations: | 14 - 0 self |

### BibTeX

@MISC{Ong02hyperkernels,

author = {Cheng Soon Ong and Alexander J. Smola and Robert C. Williamson},

title = {Hyperkernels},

year = {2002}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider the problem of choosing a kernel suitable for estimation using a Gaussian Process estimator or a Support Vector Machine. A novel solution is presented which involves defining a Reproducing Kernel Hilbert Space on the space of kernels itself. By utilizing an analog of the classical representer theorem, the problem of choosing a kernel from a parameterized family of kernels (e.g. of varying width) is reduced to a statistical estimation problem akin to the problem of minimizing a regularized risk functional. Various classical settings for model or kernel selection are special cases of our framework.

### Citations

2200 | Learning with Kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...e the form � � �s¡¤£¤¥�§ © � ¡¤£¦¥¨§ © ����� � � £¦��� � ¤ � ¥ ����� � � ��� � � � � � ���§¦©¨��� � ��� � where � � � � � � � £ � is the RKHS norm of . By virtue of the representer theorem (see e.g., =-=[4, 8]-=-) we know that the minimizer over of (5) can be written as a kernel expansion. For a given loss � this leads to the quality functional � ��� £¦���¨£¦§ � � ����� � � © ¡¤£¦¥¨§s¡�£¦¥�§ © � ��������� ���... |

1742 | Experiments with a new boosting algorithm
- Freund, Schapire
- 1996
(Show Context)
Citation Context ...en. Using the same non-optimized parameters for different data sets we achieved results comparable to other recent work on classification such as boosting, optimized SVMs, and kernel target alignment =-=[10, 11, 7]-=- (note that we use a much smaller part of the data for training: ¢ � ¢sData(size) Train Test Train Test [10, 11] SVM pima(768) 25.2s2.0 26.2s3.3 22.2s1.4 23.2s2.0 23.5 22.9s2.0 ionosph(351) 13.4s2.0 1... |

577 | Learning the kernel matrix with semidefinite programming
- Lanckriet, Cristianini, et al.
(Show Context)
Citation Context ...ques to assist in this choice. Even the restricted problem of choosing the “width” of a parameterized family of kernels (e.g. Gaussian) has not had a simple and elegant solution. A recent development =-=[1]-=- which solves the above problem in a restricted sense involves the use of semidefinite programming to learn an arbitrary positive semidefinite matrix ¢ , subject to minimization of criteria such as th... |

272 | Soft margins for AdaBoost
- Rätsch, Onoda, et al.
- 2001
(Show Context)
Citation Context ...en. Using the same non-optimized parameters for different data sets we achieved results comparable to other recent work on classification such as boosting, optimized SVMs, and kernel target alignment =-=[10, 11, 7]-=- (note that we use a much smaller part of the data for training: ¢ � ¢sData(size) Train Test Train Test [10, 11] SVM pima(768) 25.2s2.0 26.2s3.3 22.2s1.4 23.2s2.0 23.5 22.9s2.0 ionosph(351) 13.4s2.0 1... |

202 | Prediction with Gaussian processes: from linear regression to linear prediction and beyond,” in Learning and inference in graphical models
- Williams
- 1998
(Show Context)
Citation Context ...f semidefinite programming to learn an arbitrary positive semidefinite matrix ¢ , subject to minimization of criteria such as the kernel target alignment [1], the maximum of the posterior probability =-=[2]-=-, the minimization of a learning-theoretical bound [3], or subject to cross-validation settings [4]. The restriction mentioned is that the methods work with the kernel matrix, rather than the kernel i... |

198 | Efficient SVM training using low-rank kernel representations - Fine, Scheinberg |

153 |
Spline models for observational data, volume 59
- Wahba
- 1990
(Show Context)
Citation Context ...zation of criteria such as the kernel target alignment [1], the maximum of the posterior probability [2], the minimization of a learning-theoretical bound [3], or subject to cross-validation settings =-=[4]-=-. The restriction mentioned is that the methods work with the kernel matrix, rather than the kernel itself. Furthermore, whilst demonstrably improving the performance of estimators to some degree, the... |

75 | On the complexity of learning the kernel matrix
- Bousquet, Herrmann
- 2002
(Show Context)
Citation Context ...n presented is for optimizing kernels themselves, rather than the kernel matrix as in [1]. Other approaches on learning the kernel include using boosting [5] and by bounding the Rademacher complexity =-=[6]-=-.sOutline of the Paper We show (Section 2) that for most kernel-based learning methods there exists a functional, the quality functional 1 , which plays a similar role to the empirical risk functional... |

64 | Kernel design using boosting
- Crammer, Keshet, et al.
- 2003
(Show Context)
Citation Context ...ample size available. Furthermore, the solution presented is for optimizing kernels themselves, rather than the kernel matrix as in [1]. Other approaches on learning the kernel include using boosting =-=[5]-=- and by bounding the Rademacher complexity [6].sOutline of the Paper We show (Section 2) that for most kernel-based learning methods there exists a functional, the quality functional 1 , which plays a... |

31 |
On the extensions of kernel alignment
- Kandola, Shawe-Taylor, et al.
- 2002
(Show Context)
Citation Context ... � � � � � �s����� � ��������� � ������� is known as the expected risk. We now present some examples of quality func��� � Example 1 (Kernel Target Alignment) This quality functional was introduced in =-=[7]-=- to assess the “alignment” of a kernel with training labels. It is defined by § �¨©�����©�¡ ¥¨� � � ����� � � © � ¡¤£¦¥¨§ © ��������� ¡¤£¦¥¨§s������� � � � � where � denotes the vector of � ¡¤£¤¥�§ © ... |

30 |
Choosing kernel parameters for support vector machines,” submitted to
- Chapelle, Vapnik
- 2000
(Show Context)
Citation Context ...ive semidefinite matrix ¢ , subject to minimization of criteria such as the kernel target alignment [1], the maximum of the posterior probability [2], the minimization of a learning-theoretical bound =-=[3]-=-, or subject to cross-validation settings [4]. The restriction mentioned is that the methods work with the kernel matrix, rather than the kernel itself. Furthermore, whilst demonstrably improving the ... |