Results 1  10
of
1,245
Learning the Kernel Matrix with SemiDefinite Programming
, 2002
"... Kernelbased learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information ..."
Abstract

Cited by 775 (21 self)
 Add to MetaCart
Kernelbased learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the socalled kernel matrix, a symmetric and positive definite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input spaceclassical model selection problems in machine learning. In this paper we show how the kernel matrix can be learned from data via semidefinite programming (SDP) techniques. When applied
The 2005 pascal visual object classes challenge
, 2006
"... Abstract. The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not presegmented objects). Four object classes were selected: motorbikes, bicycles, cars and peop ..."
Abstract

Cited by 649 (23 self)
 Add to MetaCart
(Show Context)
Abstract. The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not presegmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved. 1
The pyramid match kernel: Discriminative classification with sets of image features
 IN ICCV
, 2005
"... Discriminative learning is challenging when examples are sets of features, and the sets vary in cardinality and lack any sort of meaningful ordering. Kernelbased classification methods can learn complex decision boundaries, but a kernel over unordered set inputs must somehow solve for correspondenc ..."
Abstract

Cited by 544 (29 self)
 Add to MetaCart
(Show Context)
Discriminative learning is challenging when examples are sets of features, and the sets vary in cardinality and lack any sort of meaningful ordering. Kernelbased classification methods can learn complex decision boundaries, but a kernel over unordered set inputs must somehow solve for correspondences – generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function which maps unordered feature sets to multiresolution histograms and computes a weighted histogram intersection in this space. This “pyramid match” computation is linear in the number of features, and it implicitly finds correspondences based on the finest resolution histogram cell where a matched pair first appears. Since the kernel does not penalize the presence of extra features, it is robust to clutter. We show the kernel function is positivedefinite, making it valid for use in learning algorithms whose optimal solutions are guaranteed only for Mercer kernels. We demonstrate our algorithm on object recognition tasks and show it to be accurate and dramatically faster than current approaches.
Learning Multiple Tasks with Kernel Methods
 Journal of Machine Learning Research
, 2005
"... Editor: John ShaweTaylor We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard singletask kernel methods, such as support vector machines and regularization networks, are extended to the case of multitask learning. Our analysis sh ..."
Abstract

Cited by 251 (10 self)
 Add to MetaCart
(Show Context)
Editor: John ShaweTaylor We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard singletask kernel methods, such as support vector machines and regularization networks, are extended to the case of multitask learning. Our analysis shows that the problem of estimating many task functions with regularization can be cast as a single task learning problem if a family of multitask kernel functions we define is used. These kernels model relations among the tasks and are derived from a novel form of regularizers. Specific kernels that can be used for multitask learning are provided and experimentally tested on two real data sets. In agreement with past empirical work on multitask learning, the experiments show that learning multiple related tasks simultaneously using the proposed approach can significantly outperform standard singletask learning particularly when there are many related tasks but few data per task.
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
"... ABSTRACT This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commu ..."
Abstract

Cited by 194 (19 self)
 Add to MetaCart
(Show Context)
ABSTRACT This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the "length" of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel (it contains innerproducts closely related to commute times). A procedure for computing the subspace projection of the node vectors of the graph that preserves as much variance as possible in terms of the commutetime distance a principal components analysis (PCA) of the graph is also introduced. This graph PCA provides a nice interpretation to the "Fiedler vector", widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled "statistical relational learning" framework, could also be used to compute document or word similarities, and, more generally, could be applied to machinelearning and patternrecognition tasks involving a database. * François Fouss, Alain Pirotte and Marco Saerens are with the
The pyramid match kernel: Efficient learning with sets of features
 Journal of Machine Learning Research
, 2007
"... In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kern ..."
Abstract

Cited by 136 (10 self)
 Add to MetaCart
(Show Context)
In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multiresolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches.
A Hilbert space embedding for distributions
 In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for ..."
Abstract

Cited by 113 (44 self)
 Add to MetaCart
(Show Context)
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or KullbackLeibler divergence, we require sophisticated space partitioning and/or
Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
 In Advances in Neural Information Processing Systems (NIPS
, 2007
"... by convex risk minimization ..."
(Show Context)
Exploring large feature spaces with hierarchical MKL
, 2008
"... For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or H ..."
Abstract

Cited by 110 (22 self)
 Add to MetaCart
(Show Context)
For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsityinducing norms such as the ℓ 1norm or the block ℓ 1norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsityinducing norms leads to stateoftheart predictive performance. 1