Results 1  10
of
220
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little u ..."
Abstract

Cited by 2807 (126 self)
 Add to MetaCart
(Show Context)
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for realtime applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition
Regularized multitask learning
, 2004
"... This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize cl ..."
Abstract

Cited by 267 (2 self)
 Add to MetaCart
(Show Context)
This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize classes of matrix– valued kernels which are linear and are of the dot product or the translation invariant type. We discuss how these kernels can be used to model relations between the tasks and present linear multi–task learning algorithms. Finally, we present a novel proof of the representer theorem for a minimizer of a regularization functional which is based on the notion of minimal norm interpolation. 1
On the Influence of the Kernel on the Consistency of Support Vector Machines
 Journal of Machine Learning Research
, 2001
"... In this article we study the generalization abilities of several classifiers of support vector machine (SVM) type using a certain class of kernels that we call universal. It is shown that the soft margin algorithms with universal kernels are consistent for a large class of classification problems ..."
Abstract

Cited by 215 (21 self)
 Add to MetaCart
(Show Context)
In this article we study the generalization abilities of several classifiers of support vector machine (SVM) type using a certain class of kernels that we call universal. It is shown that the soft margin algorithms with universal kernels are consistent for a large class of classification problems including some kind of noisy tasks provided that the regularization parameter is chosen well. In particular we derive a simple su#cient condition for this parameter in the case of Gaussian RBF kernels. On the one hand our considerations are based on an investigation of an approximation propertythe socalled universalityof the used kernels that ensures that all continuous functions can be approximated by certain kernel expressions. This approximation property also gives a new insight into the role of kernels in these and other algorithms. On the other hand the results are achieved by a precise study of the underlying optimization problems of the classifiers. Furthermore, we show consistency for the maximal margin classifier as well as for the soft margin SVM's in the presence of large margins. In this case it turns out that also constant regularization parameters ensure consistency for the soft margin SVM's. Finally we prove that even for simple, noise free classification problems SVM's with polynomial kernels can behave arbitrarily badly.
Variable Kernel Density Estimation
 Annals of Statistics
, 1992
"... In this paper, we propose a method for robust kernel density estimation. We interpret a KDE with Gaussian kernel as the inner product between a mapped test point and the centroid of mapped training points in kernel feature space. Our robust KDE replaces the centroid with a robust estimate based on M ..."
Abstract

Cited by 108 (4 self)
 Add to MetaCart
In this paper, we propose a method for robust kernel density estimation. We interpret a KDE with Gaussian kernel as the inner product between a mapped test point and the centroid of mapped training points in kernel feature space. Our robust KDE replaces the centroid with a robust estimate based on Mestimation [1]. The iteratively reweighted least squares (IRWLS) algorithm for Mestimation depends only on inner products, and can therefore be implemented using the kernel trick. We prove the IRWLS method monotonically decreases its objective value at every iteration for a broad class of robust loss functions. Our proposed method is applied to synthetic data and network traffic volumes, and the results compare favorably to the standard KDE. Index Terms — kernel density estimation, Mestimator, outlier, kernel feature space, kernel trick 1.
Everything Old Is New Again: A Fresh Look at Historical Approaches
 in Machine Learning. PhD thesis, MIT
, 2002
"... 2 Everything Old Is New Again: A Fresh Look at Historical ..."
Abstract

Cited by 106 (7 self)
 Add to MetaCart
(Show Context)
2 Everything Old Is New Again: A Fresh Look at Historical
Regularized LeastSquares Classification
"... We consider the solution of binary classification problems via Tikhonov regularization in a Reproducing Kernel Hilbert Space using the square loss, and denote the resulting algorithm Regularized LeastSquares Classification (RLSC). We sketch ..."
Abstract

Cited by 100 (1 self)
 Add to MetaCart
We consider the solution of binary classification problems via Tikhonov regularization in a Reproducing Kernel Hilbert Space using the square loss, and denote the resulting algorithm Regularized LeastSquares Classification (RLSC). We sketch
A new approach to collaborative filtering: Operator estimation with spectral regularization
 Journal of Machine Learning Research
"... We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users ” to a set of possibly desired “objects”. In particular, several recent lowrank type matrixcompletion methods for CF are shown to be special cases of our p ..."
Abstract

Cited by 93 (3 self)
 Add to MetaCart
(Show Context)
We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users ” to a set of possibly desired “objects”. In particular, several recent lowrank type matrixcompletion methods for CF are shown to be special cases of our proposed framework. Unlike existing regularizationbased CF, our approach can be used to incorporate additional information such as attributes of the users/objects—a feature currently lacking in existing regularizationbased CF approaches—using popular and wellknown kernel methods. We provide novel representer theorems that we use to develop new estimation methods. We then provide learning algorithms based on lowrank decompositions and test them on a standard CF data set. The experiments indicate the advantages of generalizing the existing regularizationbased CF methods to incorporate related information about users and objects. Finally, we show that certain multitask learning methods can be also seen as special cases of our proposed approach.
Online Bayes Point Machines
"... We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptronlike algorithms run in parallel in order to approximate the Bayes point. A rand ..."
Abstract

Cited by 82 (3 self)
 Add to MetaCart
(Show Context)
We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptronlike algorithms run in parallel in order to approximate the Bayes point. A random subsample of the incoming data stream is used to ensure diversity in the perceptron solutions. We experimentally study the algorithm's performance on online and batch learning settings.
Kernel methods for missing variables
 Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics
, 2005
"... We present methods for dealing with missing variables in the context of Gaussian Processes and Support Vector Machines. This solves an important problem which has largely been ignored by kernel methods: How to systematically deal with incomplete data? Our method can also be applied to problems wit ..."
Abstract

Cited by 77 (3 self)
 Add to MetaCart
(Show Context)
We present methods for dealing with missing variables in the context of Gaussian Processes and Support Vector Machines. This solves an important problem which has largely been ignored by kernel methods: How to systematically deal with incomplete data? Our method can also be applied to problems with partially observed labels as well as to the transductive setting where we view the labels as missing data. Our approach relies on casting kernel methods as an estimation problem in exponential families. Hence, estimation with missing variables becomes a problem of computing marginal distributions, and finding efficient optimization methods. To that extent we propose an optimization scheme which extends the Concave Convex Procedure (CCP) of Yuille and Rangarajan, and present a simplified and intuitive proof of its convergence. We show how our algorithm can be specialized to various cases in order to efficiently solve the optimization problems that arise. Encouraging preliminary experimental results on the USPS dataset are also presented. 1