#### DMCA

## The kernel trick for distances (1993)

### Cached

### Download Links

- [www.cs.cmu.edu]
- [www.kernel-machines.org]
- [www.kyb.mpg.de]
- [www.kyb.mpg.de]
- DBLP

### Other Repositories/Bibliography

Venue: | TR MSR 2000-51, Microsoft Research |

Citations: | 113 - 0 self |

### Citations

12897 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...tions defined on pairs of input patterns. This trick allows the formulation of nonlinear variants of any algorithm that can be cast in terms of dot products, SVMs being but the most prominent example =-=[14, 9, 4]-=-. Although the mathematical result underlying the kernel trick is almost a century old [7], it was only much later [1, 3, 14] that it was made fruitful for the machine learning community. Kernel metho... |

1828 | A training algorithm for optimal margin classifiers - Boser, Guyon, et al. - 1992 |

1540 | Nonlinear component analysis as a kernel eigenvalue problem
- Scholkopf, Smola, et al.
- 1998
(Show Context)
Citation Context ...cts out the same subspace as (6) in the definition of conditionally positive matrices. (ii) Another example of a kernel algorithm that works with conditionally positive definite kernels is kernel PCA =-=[10]-=-, where the data is centered, thus removing the dependence on the origin in feature space. Formally, this follows from Proposition 7 for c i = 1=m. Example 10 (Parzen windows) One of the simplest dist... |

1271 |
An Introduction to Support Vector Machines
- Cristianini, Shawe-Taylor
- 2000
(Show Context)
Citation Context ...tions defined on pairs of input patterns. This trick allows the formulation of nonlinear variants of any algorithm that can be cast in terms of dot products, SVMs being but the most prominent example =-=[14, 9, 4]-=-. Although the mathematical result underlying the kernel trick is almost a century old [7], it was only much later [1, 3, 14] that it was made fruitful for the machine learning community. Kernel metho... |

499 | Convolution kernels on discrete structures
- Haussler
- 1999
(Show Context)
Citation Context ...s in feature spaces. Again, the underlying mathematical results have been known for quite a while [8]; some of them have already attracted interest in the kernel methods community in various contexts =-=[12, 6, 16]-=-. Let us consider training data (x 1 ; y 1 ); : : : ; (x m ; ym ) 2 X \Theta Y : Here, Y is the set of possible outputs (e.g., in pattern recognition, f\Sigma1g), and X is some nonempty set (the domai... |

493 |
Theory and Methods of Scaling
- Torgerson
- 1958
(Show Context)
Citation Context ...position 3: in practice, we might want to choose other points as origins in feature space --- points that do not have a preimage x 0 in input space, such as (usually) the mean of a set of points (cf. =-=[13]-=-). This will be useful when considering kernel PCA. Crucial is only that our reference point's behaviour under translations is identical to that of individual points. This is taken care of by the cons... |

394 |
Functions of positive and negative type and their connection with the theory of integral equations
- Mercer
- 1909
(Show Context)
Citation Context ... of any algorithm that can be cast in terms of dot products, SVMs being but the most prominent example [14, 9, 4]. Although the mathematical result underlying the kernel trick is almost a century old =-=[7]-=-, it was only much later [1, 3, 14] that it was made fruitful for the machine learning community. Kernel methods have since led to interesting generalizations of learning algorithms and to successful ... |

393 |
Theoretical foundations of the potential function method in pattern recognition learning. Automation And Remote Control
- Aizerman, Braverman, et al.
- 1964
(Show Context)
Citation Context ...e cast in terms of dot products, SVMs being but the most prominent example [14, 9, 4]. Although the mathematical result underlying the kernel trick is almost a century old [7], it was only much later =-=[1, 3, 14]-=- that it was made fruitful for the machine learning community. Kernel methods have since led to interesting generalizations of learning algorithms and to successful real-world applications [9]. The pr... |

388 | Regularization Theory and Neural Network Architectures
- Girosi, Jones, et al.
- 1995
(Show Context)
Citation Context ...here k is cpd and b 2 R, is also cpd. In particular, since pd kernels are cpd, we can take any pd kernel and offset it by b and it will still be at least cpd. For further examples of cpd kernels, cf. =-=[2, 15, 5, 12]-=-. We now return to the main flow of the argument. Proposition 3 allows us to construct the feature map for k from that of the pd kernel ~ k. To this end, fix x 0 2 X and define ~ k according to (7). D... |

198 |
Harmonic Analysis on Semigroups
- Berg, Christensen, et al.
- 1984
(Show Context)
Citation Context ...nce of the linear space F facilitates a number of algorithmic and theoretical issues. It is well established that (1) works out for Mercer kernels [3, 14], or, equivalently, positive definite kernels =-=[2, 15]-=-. Here and below, incides i and j by default run over 1; : : : ; m. Definition 1 (Positive definite kernel) A symmetric function k : X \Theta X ! R which for all m 2 N; x i 2 X gives rise to a positiv... |

191 | Metric spaces and positive definite functions. The Annals of Mathematics
- Schoenberg
- 1938
(Show Context)
Citation Context ...utility of the kernel trick by looking at the problem of which kernels can be used to compute distances in feature spaces. Again, the underlying mathematical results have been known for quite a while =-=[8]-=-; some of them have already attracted interest in the kernel methods community in various contexts [12, 6, 16]. Let us consider training data (x 1 ; y 1 ); : : : ; (x m ; ym ) 2 X \Theta Y : Here, Y i... |

189 |
Splines Models of Observational Data, volume 59
- Wahba
- 1990
(Show Context)
Citation Context ...nce of the linear space F facilitates a number of algorithmic and theoretical issues. It is well established that (1) works out for Mercer kernels [3, 14], or, equivalently, positive definite kernels =-=[2, 15]-=-. Here and below, incides i and j by default run over 1; : : : ; m. Definition 1 (Positive definite kernel) A symmetric function k : X \Theta X ! R which for all m 2 N; x i 2 X gives rise to a positiv... |

174 | The connection between regularization operators and support vector kernels, Neural Networks 11
- Smola, Schölkopf, et al.
- 1998
(Show Context)
Citation Context ...s in feature spaces. Again, the underlying mathematical results have been known for quite a while [8]; some of them have already attracted interest in the kernel methods community in various contexts =-=[12, 6, 16]-=-. Let us consider training data (x 1 ; y 1 ); : : : ; (x m ; ym ) 2 X \Theta Y : Here, Y is the set of possible outputs (e.g., in pattern recognition, f\Sigma1g), and X is some nonempty set (the domai... |

39 | Schölkopf: “Semiparametric support vector and linear programming machines,” Nuero COLT TR
- Smola, Freiß, et al.
- 1998
(Show Context)
Citation Context ... two classes of data is independent of the origin's position. Seen in this light, it is not surprising that the structure of the dual optimization problem (cf. [14]) allows cpd kernels: as noticed in =-=[12, 11]-=-, the constraint P m i=1 ff i y i = 0 projects out the same subspace as (6) in the definition of conditionally positive matrices. (ii) Another example of a kernel algorithm that works with conditional... |