Results 1  10
of
16
Text Classification using String Kernels
"... We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguo ..."
Abstract

Cited by 494 (7 self)
 Add to MetaCart
We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by anexponentially decaying factor of their full length in the text, hence emphasising those occurrences that are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be e ciently evaluated by a dynamic programming technique. Experimental comparisons of the performance of the kernel compared with a standard word feature space kernel Joachims (1998) show positive results on modestly sized datasets. The case of contiguous subsequences is also considered for comparison with the subsequences kernel with di erent decay factors. For larger documents and datasets the paper introduces an approximation technique that is shown to deliver good approximations e ciently for large datasets.
Support Vector Machines: Hype or Hallelujah?
 SIGKDD Explorations
, 2003
"... Support Vector Machines (SVMs) and related kernel methods have become increasingly popular tools for data mining tasks such as classification, regression, and novelty detection. The goal of this tutorial is to provide an intuitive explanation of SVMs from a geometric perspective. The classification ..."
Abstract

Cited by 121 (1 self)
 Add to MetaCart
Support Vector Machines (SVMs) and related kernel methods have become increasingly popular tools for data mining tasks such as classification, regression, and novelty detection. The goal of this tutorial is to provide an intuitive explanation of SVMs from a geometric perspective. The classification problem is used to investigate the basic concepts behind SVMs and to examine their strengths and weaknesses from a data mining perspective. While this overview is not comprehensive, it does provide resources for those interested in further exploring SVMs.
Latent Semantic Kernels
"... Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vectorspace representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic In ..."
Abstract

Cited by 112 (8 self)
 Add to MetaCart
Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vectorspace representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic Indexing (LSI) has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between two documents. One of its main drawbacks, in IR, is its computational cost. In this paper we describe how the LSI approach can be implementedinakernelde ned feature space. We provide experimental results demonstrating that the approach can significantly improve performance, and that it does not impair it.
Online Bayes Point Machines
"... We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptronlike algorithms run in parallel in order to approximate the Bayes point. A rand ..."
Abstract

Cited by 82 (3 self)
 Add to MetaCart
(Show Context)
We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptronlike algorithms run in parallel in order to approximate the Bayes point. A random subsample of the incoming data stream is used to ensure diversity in the perceptron solutions. We experimentally study the algorithm's performance on online and batch learning settings.
Composite Kernels for Hypertext Categorisation
 In Proceedings of the International Conference on Machine Learning (ICML
, 2001
"... Kernels are problemspecific functions that act as an interface between the learning system and the data. While it is wellknown when the combination of two kernels is again a valid kernel, it is an open question if the resulting kernel will perform well. In particular, in which situations can a com ..."
Abstract

Cited by 69 (0 self)
 Add to MetaCart
(Show Context)
Kernels are problemspecific functions that act as an interface between the learning system and the data. While it is wellknown when the combination of two kernels is again a valid kernel, it is an open question if the resulting kernel will perform well. In particular, in which situations can a combination of kernel be expected to perform better than its components considered separately? We investigate this problem by looking at the task of designing kernels for hypertext classification, where both words and links information can be exploited. We provide sufficient conditions that indicate when an improvement can be expected, highlighting and formalising the notion of "independent kernels". Experimental results confirm the predictions of the theory in the hypertext domain.
Kernel Methods: A Survey of Current Techniques
 Neurocomputing
, 2000
"... : Kernel Methods have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. They exhibit good generalisation performance on many reallife datasets and the approach is properly motivated theoretically. There are relatively few free ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
(Show Context)
: Kernel Methods have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. They exhibit good generalisation performance on many reallife datasets and the approach is properly motivated theoretically. There are relatively few free parameters to adjust and the architecture of the learning machine does not need to be found by experimentation. In this tutorial we survey this subject with a principal focus on the most wellknown models based on kernel substitution, namely, Support Vector Machines. 1 Introduction. Support Vector Machines (SVMs) have been successfully applied to a number of applications ranging from particle identification, face identification and text categorisation to engine knock detection, bioinformatics and database marketing [9]. The approach is systematic and properly motivated by statistical learning theory [42]. Training involves optimisation of a convex cost function: there are no false local mi...
Ovarian cancer identification based on dimensionality reduction for highthroughput mass spectometry data
 Bioinformatics
"... atics.oxfordjournals.org/ ..."
(Show Context)
Latent Semantic Kernels for Feature Selection
, 2000
"... Latent Semantic Indexing is a method for selecting informative subspaces of feature spaces. It was developed for information retrieval to reveal semantic information from document cooccurrences. The paper demonstrates how this method can be implemented implicitly to a kernel defined feature spac ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
Latent Semantic Indexing is a method for selecting informative subspaces of feature spaces. It was developed for information retrieval to reveal semantic information from document cooccurrences. The paper demonstrates how this method can be implemented implicitly to a kernel defined feature space and hence adapted for application to any kernel based learning algorithm and data. Experiments with text and UCI data show the technique can improve generalisation performance by focussing attention of a Support Vector Machine onto informative subspaces of the feature space. 1 Introduction Kernelbased learning methods map the input data into a high dimensional feature space implicitly defined by a kernel function, which returns the inner product between the images of two data points in the feature space. The learning then takes place in the feature space, provided the learning algorithm can be written so that the data and test points only appear inside dot products. Several linear ...
Kernel Machines and Boolean Functions
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14
, 2001
"... We give results about the learnability and required complexity of logical formulae to solve classification problems. These results are obtained by linking propositional logic with kernel machines. In particular we show that decision trees and disjunctive normal forms (DNF) can be represented by ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
We give results about the learnability and required complexity of logical formulae to solve classification problems. These results are obtained by linking propositional logic with kernel machines. In particular we show that decision trees and disjunctive normal forms (DNF) can be represented by the help of a special kernel, linking regularized risk to separation margin. Subsequently we derive a number of lower bounds on the required complexity of logic formulae using properties of algorithms for generation of linear estimators, such as perceptron and maximal perceptron learning.
Theoretical and Practical Model Selection Methods for Support Vector Classifiers
 in L. Wang (Ed.), Support Vector Machines: Theory and Applications
, 2005
"... Abstract. In this chapter, we revise several methods for SVM model selection, deriving from different approaches: some of them build on practical lines of reasoning but are not fully justified by a theoretical point of view; on the other hand, some methods rely on rigorous theoretical work but are o ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract. In this chapter, we revise several methods for SVM model selection, deriving from different approaches: some of them build on practical lines of reasoning but are not fully justified by a theoretical point of view; on the other hand, some methods rely on rigorous theoretical work but are of little help when applied to real–world problems, because the underlying hypotheses cannot be verified or the result of their application is uninformative. Our objective is to sketch some light on these issues by carefully analyze the most well–known methods and test some of them on standard benchmarks to evaluate their effectiveness.