Results 1 - 10
of
249
An introduction to kernel-based learning algorithms
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract
-
Cited by 598 (55 self)
- Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Using the Fisher kernel method to detect remote protein homologies
- In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
, 1999
"... A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hid ..."
Abstract
-
Cited by 208 (4 self)
- Add to MetaCart
A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hidden Markov model. The general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.
Combining pairwise sequence similarity and support vector machines for remote protein homology detection
- Proc. 6th Ann. Int. Conf. Computational Molecular Biology
, 2002
"... One key element in understanding the molecular machinery of the cell is to understand the structure and function of each protein encoded in the genome. A very successful means of inferring the structure or function of a previously unannotated protein is via sequence similarity with one or more prote ..."
Abstract
-
Cited by 205 (20 self)
- Add to MetaCart
(Show Context)
One key element in understanding the molecular machinery of the cell is to understand the structure and function of each protein encoded in the genome. A very successful means of inferring the structure or function of a previously unannotated protein is via sequence similarity with one or more proteins whose structure or function is already known. Toward this end, we propose a means of representing proteins using pairwise sequence similarity scores. This representation, combined with a discriminative classi � cation algorithm known as the support vector machine (SVM), provides a powerful means of detecting subtle structural and evolutionary relationships among proteins. The algorithm, called SVM-pairwise, when tested on its ability to recognize previously unseen families from the SCOP database, yields signi � cantly better performance than SVM-Fisher, pro � le HMMs, and PSI-BLAST. Key words: pairwise sequence comparison, homology, detection, support vector machines. 1.
Marginalized kernels between labeled graphs
- Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... A new kernel function between two labeled graphs is presented. Feature vectors are defined as the counts of label paths produced by random walks on graphs. The kernel computation finally boils down to obtaining the stationary state of a discrete-time linear system, thus is efficiently performed by s ..."
Abstract
-
Cited by 194 (15 self)
- Add to MetaCart
A new kernel function between two labeled graphs is presented. Feature vectors are defined as the counts of label paths produced by random walks on graphs. The kernel computation finally boils down to obtaining the stationary state of a discrete-time linear system, thus is efficiently performed by solving simultaneous linear equations. Our kernel is based on an infinite dimensional feature space, so it is fundamentally different from other string or tree kernels based on dynamic programming. We will present promising empirical results in classification of chemical compounds. 1 1.
Mismatch string kernels for discriminative protein classification
- Bioinformatics
, 2004
"... Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training a ..."
Abstract
-
Cited by 193 (9 self)
- Add to MetaCart
(Show Context)
Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. Results: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns.Thus, the kernels provide a biologically wellmotivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies. Availability: SVM software is publicly available at
Predicting Protein-Protein Interactions From Primary Structure
, 2001
"... Motivation: An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. The expectation is that this will provide a fuller appreciation of cellular processes and networks at the protein level, ultimately leading to a better un ..."
Abstract
-
Cited by 191 (4 self)
- Add to MetaCart
Motivation: An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. The expectation is that this will provide a fuller appreciation of cellular processes and networks at the protein level, ultimately leading to a better understanding of disease mechanisms and suggesting new means for intervention. This paper addresses the question: can protein--protein interactions be predicted directly from primary structure and associated data? Using a diverse database of known protein interactions, a Support Vector Machine (SVM) learning system was trained to recognize and predict interactions based solely on primary structure and associated physicochemical properties. Results: Inductive accuracy of the trained system, defined here as the percentage of correct protein interaction predictions for previously unseen test sets, averaged 80% for the ensemble of statistical experiments. Future proteomics studies may benefit from this research by proceeding directly from the automated identification of a cell's gene products to prediction of protein interaction pairs. Contact: dgough@bioeng.ucsd.edu
W.S.: Mismatch string kernels for SVM protein classification
- In: Proc. of NIPS. (2003
"... We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismat ..."
Abstract
-
Cited by 155 (18 self)
- Add to MetaCart
(Show Context)
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings. 1
A Survey of Kernels for Structured Data
, 2003
"... Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much ‘real-world’ data, however, is structured – it has no natural representation in a single table. Usually, to apply kernel methods to ‘real-wor ..."
Abstract
-
Cited by 146 (2 self)
- Add to MetaCart
Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much ‘real-world’ data, however, is structured – it has no natural representation in a single table. Usually, to apply kernel methods to ‘real-world’ data, extensive pre-processing is performed to embed the data into a real vector space and thus in a single table. This survey describes several approaches of defining positive definite kernels on structured instances directly.
Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites
, 2000
"... Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification pro ..."
Abstract
-
Cited by 133 (14 self)
- Add to MetaCart
Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification problem. We demonstrate the applicability of support vector machines (SVMs) for this task, and show how to incorporate prior biological knowledge by engineering an appropriate kernel function. With the described techniques the recognition performance can be improved by 26% over leading existing approaches. We provide evidence that existing related methods (e.g. ESTScan) could profit from advanced TIS recognition.
Mining the Biomedical Literature in the Genomic Era: An Overview
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2003
"... The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last f ..."
Abstract
-
Cited by 132 (5 self)
- Add to MetaCart
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years there is a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature, and find the nuggets of information most relevant and useful for specific analysis tasks. This paper