Results 1  10
of
155
Marginalized kernels between labeled graphs
 Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... A new kernel function between two labeled graphs is presented. Feature vectors are defined as the counts of label paths produced by random walks on graphs. The kernel computation finally boils down to obtaining the stationary state of a discretetime linear system, thus is efficiently performed by s ..."
Abstract

Cited by 194 (15 self)
 Add to MetaCart
A new kernel function between two labeled graphs is presented. Feature vectors are defined as the counts of label paths produced by random walks on graphs. The kernel computation finally boils down to obtaining the stationary state of a discretetime linear system, thus is efficiently performed by solving simultaneous linear equations. Our kernel is based on an infinite dimensional feature space, so it is fundamentally different from other string or tree kernels based on dynamic programming. We will present promising empirical results in classification of chemical compounds. 1 1.
Mismatch string kernels for discriminative protein classification
 Bioinformatics
, 2004
"... Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training a ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
(Show Context)
Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. Results: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixedlength patterns in the data, allowing for mutations between patterns.Thus, the kernels provide a biologically wellmotivated way to compare protein sequences without relying on familybased generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with stateoftheart methods for homology detection, particularly when very few training examples are available. Examination of the highestweighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies. Availability: SVM software is publicly available at
Probability product kernels
 Journal of Machine Learning Research
, 2004
"... The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions. In the probability product kernel, data points in the input space are mapped to distributions over the sample space and a general inner product i ..."
Abstract

Cited by 180 (9 self)
 Add to MetaCart
(Show Context)
The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions. In the probability product kernel, data points in the input space are mapped to distributions over the sample space and a general inner product is then evaluated as the integral of the product of pairs of distributions. The kernel is straightforward to evaluate for all exponential family models such as multinomials and Gaussians and yields interesting nonlinear kernels. Furthermore, the kernel is computable in closed form for latent distributions such as mixture models, hidden Markov models and linear dynamical systems. For intractable models, such as switching linear dynamical systems, structured meanfield approximations can be brought to bear on the kernel evaluation. For general distributions, even if an analytic expression for the kernel is not feasible, we show a straightforward sampling method to evaluate it. Thus, the kernel permits discriminative learning methods, including support vector machines, to exploit the properties, metrics and invariances of the generative models we infer from each datum. Experiments are shown using multinomial models for text, hidden Markov models for biological data sets and linear dynamical systems for time series data.
A Survey of Kernels for Structured Data
, 2003
"... Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much ‘realworld’ data, however, is structured – it has no natural representation in a single table. Usually, to apply kernel methods to ‘realwor ..."
Abstract

Cited by 146 (2 self)
 Add to MetaCart
Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much ‘realworld’ data, however, is structured – it has no natural representation in a single table. Usually, to apply kernel methods to ‘realworld’ data, extensive preprocessing is performed to embed the data into a real vector space and thus in a single table. This survey describes several approaches of defining positive definite kernels on structured instances directly.
A kernel between sets of vectors
 In International Conference on Machine Learning
, 2003
"... In various application domains, including image recognition, it is natural to represent each example as a set of vectors. With a base kernel we can implicitly map these vectors to a Hilbert space and fit a Gaussian distribution to the whole set using Kernel PCA. We define our kernel between examples ..."
Abstract

Cited by 130 (8 self)
 Add to MetaCart
(Show Context)
In various application domains, including image recognition, it is natural to represent each example as a set of vectors. With a base kernel we can implicitly map these vectors to a Hilbert space and fit a Gaussian distribution to the whole set using Kernel PCA. We define our kernel between examples as Bhattacharyya’s measure of affinity between such Gaussians. The resulting kernel is computable in closed form and enjoys many favorable properties, including graceful behavior under transformations, potentially justifying the vector set representation even in cases when more conventional representations also exist. 1.
Profilebased string kernels for remote homology detection and motif extraction
 Journal of Bioinformatics and Computational Biology
, 2004
"... We introduce novel profilebased string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSIBLAST algorithm, to define positiondependent mutation nei ..."
Abstract

Cited by 103 (9 self)
 Add to MetaCart
(Show Context)
We introduce novel profilebased string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSIBLAST algorithm, to define positiondependent mutation neighborhoods along protein sequences for inexact matching of klength subsequences (“kmers”) in the data. By use of an efficient data structure, the kernels are fast to compute once the profiles have been obtained. For example, the time needed to run PSIBLAST in order to build the profiles is significantly longer than both the kernel computation time and the SVM training time. We present remote homology detection experiments based on the SCOP database where we show that profilebased string kernels used with
Semisupervised protein classification using cluster kernels
 Advances in Neural Information Processing Systems 16
, 2004
"... kernels ..."
(Show Context)
Fast string kernels using inexact matching for protein sequences
 Journal of Machine Learning Research
, 2004
"... We describe several families of kmer based string kernels related to the recently presented mismatch kernel and designed for use with support vector machines (SVMs) for classification of protein sequence data. These new kernels – restricted gappy kernels, substitution kernels, and wildcard kernels ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
We describe several families of kmer based string kernels related to the recently presented mismatch kernel and designed for use with support vector machines (SVMs) for classification of protein sequence data. These new kernels – restricted gappy kernels, substitution kernels, and wildcard kernels – are based on feature spaces indexed by klength subsequences (“kmers”) from the string alphabet Σ. However, for all kernels we define here, the kernel value K(x,y) can be computed in O(cK(x+y)) time, where the constant cK depends on the parameters of the kernel but is independent of the size Σ  of the alphabet. Thus the computation of these kernels is linear in the length of the sequences, like the mismatch kernel, but we improve upon the parameterdependent constant cK = k m+1 Σ  m of the (k,m)mismatch kernel. We compute the kernels efficiently using a trie data structure and relate our new kernels to the recently described transducer formalism. In protein classification experiments on two benchmark SCOP data sets, we show that our new faster kernels achieve SVM classification performance comparable to the mismatch kernel and the Fisher kernel derived from profile hidden Markov models, and we investigate the dependence of the kernels on parameter choice.
Rational kernels: Theory and algorithms
 Journal of Machine Learning Research
, 2004
"... Many classification algorithms were originally designed for fixedsize vectors. Recent applications in text and speech processing and computational biology require however the analysis of variablelength sequences and more generally weighted automata. An approach widely used in statistical learning ..."
Abstract

Cited by 62 (8 self)
 Add to MetaCart
(Show Context)
Many classification algorithms were originally designed for fixedsize vectors. Recent applications in text and speech processing and computational biology require however the analysis of variablelength sequences and more generally weighted automata. An approach widely used in statistical learning techniques such as Support Vector Machines (SVMs) is that of kernel methods, due to their computational efficiency in highdimensional feature spaces. We introduce a general family of kernels based on weighted transducers or rational relations, rational kernels, that extend kernel methods to the analysis of variablelength sequences or more generally weighted automata. We show that rational kernels can be computed efficiently using a general algorithm of composition of weighted transducers and a general singlesource shortestdistance algorithm. Not all rational kernels are positive definite and symmetric (PDS), or equivalently verify the Mercer condition, a condition that guarantees the convergence of training for discriminant classification algorithms such as SVMs. We present several theoretical results related to PDS rational kernels. We show that under some general conditions these kernels are
Graph Kernels for Chemical Informatics
, 2005
"... Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their cova ..."
Abstract

Cited by 59 (7 self)
 Add to MetaCart
Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depthfirst search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anticancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5 % on the Mutag dataset, 6567 % on the PTC (Predictive Toxicology Challenge) dataset, and 72 % on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.