Results 1  10
of
164
Convolution Kernels on Discrete Structures
, 1999
"... We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on an infinite set from kernels involving generators of the set. The family of kernels generated generalizes the fa ..."
Abstract

Cited by 506 (0 self)
 Add to MetaCart
(Show Context)
We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on an infinite set from kernels involving generators of the set. The family of kernels generated generalizes the family of radial basis kernels. It can also be used to define kernels in the form of joint Gibbs probability distributions. Kernels can be built from hidden Markov random elds, generalized regular expressions, pairHMMs, or ANOVA decompositions. Uses of the method lead to open problems involving the theory of infinitely divisible positive definite functions. Fundamentals of this theory and the theory of reproducing kernel Hilbert spaces are reviewed and applied in establishing the validity of the method.
In defense of onevsall classification
 Journal of Machine Learning Research
, 2004
"... Editor: John ShaweTaylor We consider the problem of multiclass classification. Our main thesis is that a simple “onevsall ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are welltuned regularized classifiers such as support vector machines. This the ..."
Abstract

Cited by 318 (0 self)
 Add to MetaCart
Editor: John ShaweTaylor We consider the problem of multiclass classification. Our main thesis is that a simple “onevsall ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are welltuned regularized classifiers such as support vector machines. This thesis is interesting in that it disagrees with a large body of recent published work on multiclass classification. We support our position by means of a critical review of the existing literature, a substantial collection of carefully controlled experimental work, and theoretical arguments.
Kernel partial least squares regression in reproducing kernel Hilbert space
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2001
"... A family of regularized least squares regression models in a Reproducing Kernel Hilbert Space is extended by the kernel partial least squares (PLS) regression model. Similar to principal components regression (PCR), PLS is a method based on the projection of input (explanatory) variables to the late ..."
Abstract

Cited by 154 (10 self)
 Add to MetaCart
A family of regularized least squares regression models in a Reproducing Kernel Hilbert Space is extended by the kernel partial least squares (PLS) regression model. Similar to principal components regression (PCR), PLS is a method based on the projection of input (explanatory) variables to the latent variables (components). However, in contrast to PCR, PLS creates the components by modeling the relationship between input and output variables while maintaining most of the information in the input variables. PLS is useful in situations where the number of explanatory variables exceeds the number of observations and/or a high level of multicollinearity among those variables is assumed. Motivated by this fact we will provide a kernel PLS algorithm for construction of nonlinear regression models in possibly highdimensional feature spaces. We give the theoretical description of the kernel PLS algorithm and we experimentally compare the algorithm with the existing kernel PCR and kernel ridge regression techniques. We will demonstrate that on the data sets employed kernel PLS achieves the same results as kernel PCR but uses significantly fewer, qualitatively different components.
Overview and recent advances in partial least squares
 in ‘Subspace, Latent Structure and Feature Selection Techniques’, Lecture Notes in Computer Science
, 2006
"... Partial Least Squares (PLS) is a wide class of methods for modeling relations between sets of observed variables by means of latent variables. It comprises of regression and classification tasks as well as dimension reduction techniques and modeling tools. The underlying assumption of all PLS method ..."
Abstract

Cited by 130 (4 self)
 Add to MetaCart
(Show Context)
Partial Least Squares (PLS) is a wide class of methods for modeling relations between sets of observed variables by means of latent variables. It comprises of regression and classification tasks as well as dimension reduction techniques and modeling tools. The underlying assumption of all PLS methods is that the
Latent Semantic Kernels
"... Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vectorspace representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic In ..."
Abstract

Cited by 114 (9 self)
 Add to MetaCart
Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vectorspace representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic Indexing (LSI) has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between two documents. One of its main drawbacks, in IR, is its computational cost. In this paper we describe how the LSI approach can be implementedinakernelde ned feature space. We provide experimental results demonstrating that the approach can significantly improve performance, and that it does not impair it.
Weighted Least Squares Support Vector Machines: robustness and sparse approximation
 NEUROCOMPUTING
"... Least Squares Support Vector Machines (LSSVM) is an SVM version which involves equality instead of inequality constraints and works with a least squares cost function. In this way the solution follows from a linear KarushKuhnTucker system instead of a quadratic programming problem. However, sp ..."
Abstract

Cited by 97 (19 self)
 Add to MetaCart
(Show Context)
Least Squares Support Vector Machines (LSSVM) is an SVM version which involves equality instead of inequality constraints and works with a least squares cost function. In this way the solution follows from a linear KarushKuhnTucker system instead of a quadratic programming problem. However, sparseness is lost in the LSSVM case and the estimation of the support values is only optimal in the case of a Gaussian distribution of the error variables. In this paper we discuss a method which can overcome these two drawbacks. We show how to obtain robust estimates for regression by applying a weighted version of LSSVM. We also discuss a sparse approximation procedure for weighted and unweighted LSSVM. It is basically a pruning method which is able to do pruning based upon the physical meaning of the sorted support values, while pruning procedures for classical multilayer perceptrons require the computation of a Hessian matrix or its inverse. The methods of this paper are illustrated for RBF kernels and demonstrate how to obtain robust estimates with selection of an appropriate number of hidden units, in the case of outliers or nonGaussian error distributions with heavy tails.
Kernel dependency estimation
 in Advances in NIPS 15
, 2003
"... We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both inpu ..."
Abstract

Cited by 84 (13 self)
 Add to MetaCart
(Show Context)
We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images. 1
A secondorder perceptron algorithm
, 2005
"... Kernelbased linearthreshold algorithms, such as support vector machines and Perceptronlike algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called secondorder Perceptr ..."
Abstract

Cited by 83 (23 self)
 Add to MetaCart
(Show Context)
Kernelbased linearthreshold algorithms, such as support vector machines and Perceptronlike algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called secondorder Perceptron, and analyze its performance within the mistake bound model of online learning. The bound achieved by our algorithm depends on the sensitivity to secondorder data information and is the best known mistake bound for (efficient) kernelbased linearthreshold classifiers to date. This mistake bound, which strictly generalizes the wellknown Perceptron bound, is expressed in terms of the eigenvalues of the empirical data correlation matrix and depends on a parameter controlling the sensitivity of the algorithm to the distribution of these eigenvalues. Since the optimal setting of this parameter is not known a priori, we also analyze two variants of the secondorder Perceptron algorithm: one that adaptively sets the value of the parameter in terms of the number of mistakes made so far, and one that is parameterless, based on pseudoinverses.
Financial timeseries prediction using least squares support vector machines within the evidence framework
 IEEE Transactions on Neural Networks
, 2001
"... Abstract—For financial time series, the generation of error bars on the point prediction is important in order to estimate the corresponding risk. The Bayesian evidence framework, already successfully applied to design of multilayer perceptrons, is applied in this paper to least squares support vect ..."
Abstract

Cited by 58 (7 self)
 Add to MetaCart
Abstract—For financial time series, the generation of error bars on the point prediction is important in order to estimate the corresponding risk. The Bayesian evidence framework, already successfully applied to design of multilayer perceptrons, is applied in this paper to least squares support vector machine (LSSVM) regression in order to infer nonlinear models for predicting a time series and the related volatility. On the first level of inference, a statistical framework is related to the LSSVM formulation which allows to include the timevarying volatility of the market by an appropriate choice of several hyperparameters. By the use of equality constraints and a 2norm, the model parameters of the LSSVM are obtained from a linear KarushKuhnTucker system in the dual space. Error bars on the model predictions are obtained by marginalizing over the model parameters. The hyperparameters of the model are inferred on the second level of inference. The inferred hyperparameters, related to the volatility, are used to construct a volatility model within the evidence framework. Model comparison is performed on the third level of inference in order to automatically tune the parameters of the kernel function and to select the relevant inputs. The LSSVM formulation allows to derive analytic expressions in the feature space and practical expressions are obtained in the dual space replacing the inner product by the related kernel function using Mercer’s theorem. The one step ahead prediction performances obtained on the prediction of the weekly 90day Tbill rate and the daily DAX30 closing prices show that significant out of sample sign predictions can be made with respect to the PesaranTimmerman test statistic. Index Terms—Bayesian inference, financial time series prediction, hyperparameter selection, least squares support vector machines (LSSVMs), model comparison, volatility modeling. I.
Learning nonlinear combinations of kernels
 In NIPS
, 2009
"... This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
(Show Context)
This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projectionbased gradient descent algorithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive experiments with this algorithm using several publicly available datasets demonstrating the effectiveness of our technique. 1