Results 1  10
of
60
Marginalized kernels between labeled graphs
 Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... A new kernel function between two labeled graphs is presented. Feature vectors are defined as the counts of label paths produced by random walks on graphs. The kernel computation finally boils down to obtaining the stationary state of a discretetime linear system, thus is efficiently performed by s ..."
Abstract

Cited by 194 (15 self)
 Add to MetaCart
A new kernel function between two labeled graphs is presented. Feature vectors are defined as the counts of label paths produced by random walks on graphs. The kernel computation finally boils down to obtaining the stationary state of a discretetime linear system, thus is efficiently performed by solving simultaneous linear equations. Our kernel is based on an infinite dimensional feature space, so it is fundamentally different from other string or tree kernels based on dynamic programming. We will present promising empirical results in classification of chemical compounds. 1 1.
A Survey of Kernels for Structured Data
, 2003
"... Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much ‘realworld’ data, however, is structured – it has no natural representation in a single table. Usually, to apply kernel methods to ‘realwor ..."
Abstract

Cited by 146 (2 self)
 Add to MetaCart
Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much ‘realworld’ data, however, is structured – it has no natural representation in a single table. Usually, to apply kernel methods to ‘realworld’ data, extensive preprocessing is performed to embed the data into a real vector space and thus in a single table. This survey describes several approaches of defining positive definite kernels on structured instances directly.
Fast Methods for Kernelbased Text Analysis
, 2003
"... Kernelbased learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. Th ..."
Abstract

Cited by 96 (1 self)
 Add to MetaCart
Kernelbased learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. The merit of the kernel methods is that effective feature combination is implicitly expanded without loss of generality and increasing the computational costs. Kernelbased text analysis shows an excellent performance in terms in accuracy; however, these methods are usually too slow to apply to largescale text analysis. In this paper, we extend a Basket Mining algorithm to convert a kernelbased classifier into a simple and fast linear classifier. Experimental results on English BaseNP Chunking, Japanese Word Segmentation and Japanese Dependency Parsing show that our new classifiers are about 30 to 300 times faster than the standard kernelbased classifiers.
An Application of Boosting to Graph Classification
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS.
, 2004
"... This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of realworld data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boos ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of realworld data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boosting algorithm in which subgraphbased decision stumps are used as weak learners. We also discuss the relation between our algorithm and SVMs with convolution kernels. Two experiments using natural language data and chemical compounds show that our method achieves comparable or even better performance than SVMs with convolution kernels as well as improves the testing efficiency.
SemiSupervised Local Fisher Discriminant Analysis for Dimensionality Reduction
 PAKDD
, 2008
"... When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly due to overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semisupervised dimensionality reduction method ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
(Show Context)
When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly due to overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semisupervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The proposed method has an analytic form of the globally optimal solution and it can be computed based on eigendecompositions. Therefore, the proposed method is computationally reliable and efficient. We show the effectiveness of the proposed method through extensive simulations with benchmark data sets.
Sequence and tree kernels with statistical feature mining
 Advances in Neural Information Processing Systems 18
, 2006
"... This paper proposes a new approach to feature selection based on a statistical feature mining technique for sequence and tree kernels. Since natural language data take discrete structures, convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of m ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
This paper proposes a new approach to feature selection based on a statistical feature mining technique for sequence and tree kernels. Since natural language data take discrete structures, convolution kernels, such as sequence and tree kernels, are advantageous for both the concept and accuracy of many natural language processing tasks. However, experiments have shown that the best results can only be achieved when limited small substructures are dealt with by these kernels. This paper discusses this issue of convolution kernels and then proposes a statistical feature selection that enable us to use larger substructures effectively. The proposed method, in order to execute efficiently, can be embedded into an original kernel calculation process by using substructure mining algorithms. Experiments on real NLP tasks confirm the problem in the conventional method and compare the performance of a conventional method to that of the proposed method. 1
A generalization of haussler’s convolution kernel: mapping kernel
 Proceeding of the International Conference on Machine Learning
, 2008
"... Haussler’s convolution kernel provides a successful framework for engineering new positive semidefinite kernels, and has been applied to a wide range of data types and applications. In the framework, each data object represents a finite set of finer grained components. Then, Haussler’s convolution k ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Haussler’s convolution kernel provides a successful framework for engineering new positive semidefinite kernels, and has been applied to a wide range of data types and applications. In the framework, each data object represents a finite set of finer grained components. Then, Haussler’s convolution kernel takes a pair of data objects as input, and returns the sum of the return values of the predetermined primitive positive semidefinite kernel calculated for all the possible pairs of the components of the input data objects. On the other hand, the mapping kernel that we introduce in this paper is a natural generalization of Haussler’s convolution kernel, in that the input to the primitive kernel moves over a predetermined subset rather than the entire cross product. Although we have plural instances of the mapping kernel in the literature, their positive semidefiniteness was investigated in casebycase manners, and worse yet, was sometimes incorrectly concluded. In fact, there exists a simple and easily checkable necessary and sufficient condition, which is generic in the sense that it enables us to investigate the positive semidefiniteness of an arbitrary instance of the mapping kernel. This is the first paper that presents and proves the validity of the condition. In addition, we introduce two important instances of the mapping kernel, which we refer to as the sizeofindexstructuredistribution kernel and the editcostdistribution kernel. Both of them are naturally derived from well known (dis)similarity measurements in the literature (e.g. the maximum agreement tree, the edit distance), and are reasonably expected to improve the performance of the existing measures by evaluating their distributional features rather than their peak (maximum/minimum) features.
Reverse Engineering of Tree Kernel Feature Spaces
"... We present a framework to extract the most important features (tree fragments) from a Tree Kernel (TK) space according to their importance in the target kernelbased machine, e.g. Support Vector Machines (SVMs). In particular, our mining algorithm selects the most relevant features based on SVM estim ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
We present a framework to extract the most important features (tree fragments) from a Tree Kernel (TK) space according to their importance in the target kernelbased machine, e.g. Support Vector Machines (SVMs). In particular, our mining algorithm selects the most relevant features based on SVM estimated weights and uses this information to automatically infer an explicit representation of the input data. The explicit features (a) improve our knowledge on the target problem domain and (b) make largescale learning practical, improving training and test time, while yielding accuracy in line with traditional TK classifiers. Experiments on semantic role labeling and question classification illustrate the above claims. 1
LeastSquares TwoSample Test
 NEURAL NETWORKS, VOL.24, NO.7, PP.735–751
, 2011
"... The goal of the twosample test (a.k.a. the homogeneity test) is, given two sets of samples, to judge whether the probability distributions behind the samples are the same or not. In this paper, we propose a novel nonparametric method of twosample test based on a leastsquares density ratio estima ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
The goal of the twosample test (a.k.a. the homogeneity test) is, given two sets of samples, to judge whether the probability distributions behind the samples are the same or not. In this paper, we propose a novel nonparametric method of twosample test based on a leastsquares density ratio estimator. Through various experiments, we show that the proposed method overall produces smaller typeII error (i.e., the probability of judging the two distributions to be the same when they are actually different) than a stateoftheart method, with slightly larger typeI error (i.e., the probability of judging the two distributions to be different when they are actually the same).
Speeding up training with tree kernels for node relation labeling
 In EMNLP 2005
, 2005
"... We present a method for speeding up the calculation of tree kernels during training. The calculation of tree kernels is still heavy even with efficient dynamic programming (DP) procedures. Our method maps trees into a small feature space where the inner product, which can be calculated much faster, ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
We present a method for speeding up the calculation of tree kernels during training. The calculation of tree kernels is still heavy even with efficient dynamic programming (DP) procedures. Our method maps trees into a small feature space where the inner product, which can be calculated much faster, yields the same value as the tree kernel for most tree pairs. The training is sped up by using the DP procedure only for the exceptional pairs. We describe an algorithm that detects such exceptional pairs and converts trees into vectors in a feature space. We propose tree kernels on marked labeled ordered trees and show that the training of SVMs for semantic role labeling using these kernels can be sped up by a factor of several tens. 1