### Citations

13222 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ..., that is, they can be accomplished without ever explicitly representing the feature vector fOE n (x)g n1 , relying instead only on indirect computations of the kernel K(x; y) or the distance d(x; y) =-=[28, 31, 2, 13, 23, 30, 17]-=- (see also the bibliography at http://svm.first.gmd.de.) The kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence b... |

5889 | A tutorial on hidden Markov models and selected applications in speech recognition
- Rabiner
(Show Context)
Citation Context ... similar objects have the same length. This occurs, for example, when the strings consist of amino acids representing proteins, nucleic acids representing genes, or phonemes representing spoken words =-=[26, 4, 9, 24]-=-. In these contexts, some objects may be missing components that other similar objects have. However, we can align any two object strings so that their corresponding components are adjacent, using a s... |

5119 | Stochastic relaxation, Gibbs Distributions and the Bayesian Restoration of Images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...ion on X \Theta X that we call a Gibbs kernel (Section 3.5). These kernels may have promising applications in areas where structures can be modeled generatively by Hidden Markov Random Fields (HMRFs) =-=[12, 18, 5]-=-. Convolution kernels can be applied iteratively to build a kernel on a infinite set from kernels involving generators of the set. We introduce a class of generalized regular expressions to define ker... |

4818 |
Introduction to automata theory, languages, and computation
- Hopcroft, Ullman
- 1979
(Show Context)
Citation Context ...[ r1 L (r) : Finally, the regular languages are defined to be the smallest set of languages that contain ffflg and fag for all letters a 2 A, and are closed under union, concatenation and Kleene star =-=[14]-=-. The operations of convex combination, simple convolution, and fl-iterated convolution may be used to define a class of probability distributions on regular languages called regular probability distr... |

3388 | A Tutorial on Support Vector Machines for Pattern Recognition - Burges - 1998 |

3282 |
An introduction to probability theory and its applications. Vol II.
- Feller
- 1966
(Show Context)
Citation Context ...ion of distributions with this relation R corresponds 10 to multiplication of generating functions. By differentiating the generating function, one obtains the moments of the distribution (see, e.g., =-=[6]-=-). Other kinds of convolutions can be used to represent combinatorial counting problems, because if g d (x d ) = 1 for all x d , then g 1 ? \Delta \Delta \Delta ? g D (x) is the cardinality of R \Gamm... |

1861 |
Spline models for observational data
- Wahba
- 1990
(Show Context)
Citation Context ...e kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence between kernels and Gaussian processes defined on the set X =-=[3, 32, 21]-=-. We do not pursue this avenue in this paper, but the kernels we develop can be plugged directly into Gaussian process methods. Convolution kernels are obtained from other kernels by a certain sum ove... |

1589 |
Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ...ion on X \Theta X that we call a Gibbs kernel (Section 3.5). These kernels may have promising applications in areas where structures can be modeled generatively by Hidden Markov Random Fields (HMRFs) =-=[12, 18, 5]-=-. Convolution kernels can be applied iteratively to build a kernel on a infinite set from kernels involving generators of the set. We introduce a class of generalized regular expressions to define ker... |

1217 | Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
- Durbin, Eddy, et al.
- 1998
(Show Context)
Citation Context ...strings that are derived from a common ancestor under the operations of insertion, deletion and substitution of letters (Section 4.4). This and similar kernels are related to the pair-HMMs defined in =-=[4]. This pro-=-vides a new angle on the old field of syntactic pattern recognition, developed by Kung-Sun Fu and his colleagues [9, 10, 11]. Attempts to control the "width" parameter in generalized radial ... |

1037 |
Linear Algebra and its Applications
- Strang
- 1980
(Show Context)
Citation Context ...j = K(x i ; x j ) is positive definite, i.e. P ij c i c j K ijs0 for all c 1 ; : : : ; c N 2 !. Equivalently, a symmetric matrix is positive definite if all its eigenvalues are nonnegative, see, e.g. =-=[29]-=-. 1 Many authors consider the more general case of complex-valued kernels. The relationship between the definitions used for that case and the ones used here for the real case is discussed in [1], sec... |

745 |
Real Analysis and Probability
- Dudley
- 2002
(Show Context)
Citation Context ...e kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence between kernels and Gaussian processes defined on the set X =-=[3, 32, 21]-=-. We do not pursue this avenue in this paper, but the kernels we develop can be plugged directly into Gaussian process methods. Convolution kernels are obtained from other kernels by a certain sum ove... |

551 | Exploiting generative models in discriminative classifiers.
- Jaakkola, Haussler
- 1998
(Show Context)
Citation Context ...ides a way of using the generative probability model inherent in a HMRF to define a notion of similarity between related structures. This idea will be further developed in a separate paper. (See also =-=[16, 15]-=- for an alternate way to do this.) 4 Iterated convolution kernels and generalized regular expressions When X is countably infinite and X d = X for 1sdsD, as in the examples of kernels for strings and ... |

523 | The Geometry of Graphs and Some of its Algorithmic Applications
- Linial, London, et al.
- 1995
(Show Context)
Citation Context ...fies the triangle inequality d(x; y)sd(x; z) + d(z; y), and has the property that d(x; x) = 0. However, for many pattern recognition applications, this is not sufficient for d to be a useful distance =-=[20]-=-. For a distance d to be useful, we need to actually embed the metric space (X; d) in a finite dimensional Euclidean space ! N , or in the space of all infinite square summable sequences l 2 , via som... |

472 |
Time Warps, String Edits and Macromolecules: the Theory and Practice of Sequence Comparisons
- Sanko, Kruskal
- 1983
(Show Context)
Citation Context ... similar objects have the same length. This occurs, for example, when the strings consist of amino acids representing proteins, nucleic acids representing genes, or phonemes representing spoken words =-=[26, 4, 9, 24]-=-. In these contexts, some objects may be missing components that other similar objects have. However, we can align any two object strings so that their corresponding components are adjacent, using a s... |

258 |
Syntactic Pattern Recognition and Applications,
- Fu
- 1982
(Show Context)
Citation Context ...on 4.4). This and similar kernels are related to the pair-HMMs defined in [4]. This provides a new angle on the old field of syntactic pattern recognition, developed by Kung-Sun Fu and his colleagues =-=[9, 10, 11]. Attempts-=- to control the "width" parameter in generalized radial basis kernels derived from convolution kernels lead us to the important notion of infinitely divisible kernels, which we review (Secti... |

241 | An Equivalence Between Sparse Approximation and Support Vector Machines
- Girosi
- 1997
(Show Context)
Citation Context ..., that is, they can be accomplished without ever explicitly representing the feature vector fOE n (x)g n1 , relying instead only on indirect computations of the kernel K(x; y) or the distance d(x; y) =-=[28, 31, 2, 13, 23, 30, 17]-=- (see also the bibliography at http://svm.first.gmd.de.) The kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence b... |

208 | Using the fisher kernel method to detect remote protein homologies
- Jaakkola, Diekhans, et al.
- 1999
(Show Context)
Citation Context ...ides a way of using the generative probability model inherent in a HMRF to define a notion of similarity between related structures. This idea will be further developed in a separate paper. (See also =-=[16, 15]-=- for an alternate way to do this.) 4 Iterated convolution kernels and generalized regular expressions When X is countably infinite and X d = X for 1sdsD, as in the examples of kernels for strings and ... |

189 | Support vector machines, reproducing kernel hilbert spaces, and randomized gacv,”
- Wahba
- 1998
(Show Context)
Citation Context ...s to the important notion of infinitely divisible kernels, which we review (Section 6). Some open problems are mentioned in this regard. We also review the theory of reproducing kernel Hilbert spaces =-=[22, 32, 33]-=- (Section 7), and use it to derive several results mentioned in earlier sections. 2 Convolution kernels 2.1 Kernels Let X be a set and K : X \Theta X ! !, where ! denotes the real numbers 1 and \Theta... |

186 |
ch. Introduction to Gaussian Processes,
- Mackay
- 1998
(Show Context)
Citation Context ...e kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence between kernels and Gaussian processes defined on the set X =-=[3, 32, 21]-=-. We do not pursue this avenue in this paper, but the kernels we develop can be plugged directly into Gaussian process methods. Convolution kernels are obtained from other kernels by a certain sum ove... |

183 | Comparing support vector machines with Gaussian kernels to radial basis function classifiers,”
- Scholkopf, Kah-Kay, et al.
- 1997
(Show Context)
Citation Context ... the component x d of x. These features are then used to define a kernel K that in fact maps x implicitly into an infinite dimensional feature space. Such kernels have proven quite useful in practice =-=[27]-=-. Continuing with Example 1 from Section 2, using the same primitive features ff d (x d ) : 1sdsDg, we can define the simple exponential kernel K 1 ? \Delta \Delta \Delta ? KD (x; y) = e P D d=1 f d (... |

169 |
Harmonic Analysis on Semigroups. Theory of Positive Definite and Related Functions,
- Berg, Christensen, et al.
- 1984
(Show Context)
Citation Context ....g. [29]. 1 Many authors consider the more general case of complex-valued kernels. The relationship between the definitions used for that case and the ones used here for the real case is discussed in =-=[1]-=-, section 1.6, page 68. Virtually all of the results extend naturally to the complex case. 3 It is readily verified that if each x 2 X is represented by the sequence 2 OE(x) = fOE n (x)g n1 such that ... |

164 | Ridge Regression Learning Algorithm in Dual Variables,”
- Saunders, Gammerman, et al.
- 1998
(Show Context)
Citation Context ..., that is, they can be accomplished without ever explicitly representing the feature vector fOE n (x)g n1 , relying instead only on indirect computations of the kernel K(x; y) or the distance d(x; y) =-=[28, 31, 2, 13, 23, 30, 17]-=- (see also the bibliography at http://svm.first.gmd.de.) The kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence b... |

113 | Probabilistic kernel regression models,”
- Jaakkola, Haussler
- 1999
(Show Context)
Citation Context |

69 |
Linear Algebra and Its Applications (Harcourt Brace Jovanovich,
- Strang
- 1988
(Show Context)
Citation Context ...matrix K de ned by Kij = K(xi�xj) is positive de nite, i.e. P ij cicjKij 0 for all c1�:::�cN 2<. Equivalently, a symmetric matrix is positive de nite if all its eigenvalues are nonnegative, see, e.g. =-=[29]-=-. 1 Many authors consider the more general case of complex-valued kernels. The relationship between the de nitions used for that case and the ones used here for the real case is discussed in [1], sect... |

50 |
Grammatical inference: introduction and survey - part I
- Fu, Booth
- 1986
(Show Context)
Citation Context ...on 4.4). This and similar kernels are related to the pair-HMMs defined in [4]. This provides a new angle on the old field of syntactic pattern recognition, developed by Kung-Sun Fu and his colleagues =-=[9, 10, 11]. Attempts-=- to control the "width" parameter in generalized radial basis kernels derived from convolution kernels lead us to the important notion of infinitely divisible kernels, which we review (Secti... |

43 | A Sparse Representation for Function Approximation
- Poggio, Girosi
- 1998
(Show Context)
Citation Context |

42 | Global Self-Organization of All Known Protein Sequences Reveals Inherent Biological Signatures,
- Linial
- 1997
(Show Context)
Citation Context ... the sense that the Euclidean distance between OE(x) and OE(y) is close to the original distance d(x; y) for all x and y [20]. They apply these results to the problem of classifying protein sequences =-=[19]-=-. However, if (X; d) can be embedded in l 2 , as mentioned in the introduction, we can still take advantage of most of the classical pattern recognition, clustering, regression and classification meth... |

22 |
On fractional Hadamard powers of positive definite matrices
- FitzGerald, Horn
- 1977
(Show Context)
Citation Context ... the main theorem of which is that if a real symmetric n \Theta n matrix K is positive and positive definite, then the fractional Schur power K t = fK t ij g is positive definite for all tsn \Gamma 2 =-=[8]-=-. (This result was rediscovered 19 years later [25].) Let 1 denote the all 1s vector and n denote (1; 2; : : : ; n) T . The example Fitzgerald and Horn supply to show this bound is tight is a matrix o... |

2 |
Hilbert Space Methods
- Mate
- 1989
(Show Context)
Citation Context ...s to the important notion of infinitely divisible kernels, which we review (Section 6). Some open problems are mentioned in this regard. We also review the theory of reproducing kernel Hilbert spaces =-=[22, 32, 33]-=- (Section 7), and use it to derive several results mentioned in earlier sections. 2 Convolution kernels 2.1 Kernels Let X be a set and K : X \Theta X ! !, where ! denotes the real numbers 1 and \Theta... |

1 |
Positive powers of positive positive definite matrices
- Rosen
- 1996
(Show Context)
Citation Context ...tric n \Theta n matrix K is positive and positive definite, then the fractional Schur power K t = fK t ij g is positive definite for all tsn \Gamma 2 [8]. (This result was rediscovered 19 years later =-=[25]-=-.) Let 1 denote the all 1s vector and n denote (1; 2; : : : ; n) T . The example Fitzgerald and Horn supply to show this bound is tight is a matrix of the form M ffi = 11 T + ffinn T . They show that ... |

1 | On fractional hadamard powers of positive de nite matrices - Fitzgerald, Horn - 1977 |

1 |
Positive powers of positive positive de nite matrices
- Rosen
- 1996
(Show Context)
Citation Context ...t if a real symmetric n n matrix K is positive and positive de nite, then the fractional Schur power Kt = fKt ijg is positive de nite for all t n ; 2 [8]. (This result was rediscovered 19 years later =-=[25]-=-.) Let 1 denote the all 1s vector and n denote (1� 2�:::�n) T . The example Fitzgerald and Horn supply to show this bound is tight is a matrix of the form M = 11T + nnT . They show that for n 3 and su... |