Results 1  10
of
48
Simple and Efficient Multiple Kernel Learning by Group Lasso
"... We consider the problem of how to improve the efficiency of Multiple Kernel Learning (MKL). In literature, MKL is often solved by an alternating approach: (1) the minimization of the kernel weights is solved by complicated techniques, such as Semiinfinite Linear Programming, Gradient Descent, or Le ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
We consider the problem of how to improve the efficiency of Multiple Kernel Learning (MKL). In literature, MKL is often solved by an alternating approach: (1) the minimization of the kernel weights is solved by complicated techniques, such as Semiinfinite Linear Programming, Gradient Descent, or Level method; (2) the maximization of SVM dual variables can be solved by standard SVM solvers. However, the minimization step in these methods is usually dependent on its solving techniques or commercial softwares, which therefore limits the efficiency and applicability. In this paper, we formulate a closedform solution for optimizing the kernel weights based on the equivalence between grouplasso and MKL. Although this equivalence is not our invention, our derived variant equivalence not only leads to an efficient algorithm for MKL, but also generalizes to the case for LpMKL (p ≥ 1 and denoting the Lpnorm of kernel weights). Therefore, our proposed algorithm provides a unified solution for the entire family of LpMKL models. Experiments on multiple data sets show the promising performance of the proposed technique compared with other competitive methods. 1.
TwoStage Learning Kernel Algorithms
"... This paper examines twostage techniques for learning kernels based on a notion of alignment. It presents a number of novel theoretical, algorithmic, and empirical results for alignmentbased techniques. Our results build on previous work by Cristianini et al. (2001), but we adopt a different definit ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
This paper examines twostage techniques for learning kernels based on a notion of alignment. It presents a number of novel theoretical, algorithmic, and empirical results for alignmentbased techniques. Our results build on previous work by Cristianini et al. (2001), but we adopt a different definition of kernel alignment and significantly extend that work in several directions: we give a novel and simple concentration bound for alignment between kernel matrices; show the existence of good predictors for kernels with high alignment, both for classification and for regression; give algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP; and report the results of extensive experiments with this alignmentbased method in classification and regression tasks, which show an improvement both over the uniform combination of kernels and over other stateoftheart learning kernel methods. 1.
Generalization bounds for learning kernels
 In ICML ’10,2010
"... This paper presents several novel generalization bounds for the problem of learning kernels based on a combinatorial analysis of the Rademacher complexity of the corresponding hypothesis sets. Our bound for learning kernels with a convex combination of p base kernels using L1 regularization admits o ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
This paper presents several novel generalization bounds for the problem of learning kernels based on a combinatorial analysis of the Rademacher complexity of the corresponding hypothesis sets. Our bound for learning kernels with a convex combination of p base kernels using L1 regularization admits only a √ log p dependency on the number of kernels, which is tight and considerably more favorable than the previous best bound given for the same problem. We also give a novel bound for learning with a nonnegative combination of p base kernels with an L2 regularization whose dependency on p is also tight and only in p 1/4. We present similar results for Lq regularization with other values of q, and outline the relevance of our proof techniques to the analysis of the complexity of the class of linear functions. Experiments with a large number of kernels further validate the behavior of the generalization error as a function of p predicted by our bounds.
Multiple Kernel Learning and the SMO Algorithm
"... Our objective is to trainpnorm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efficiently scales to large problems. As a re ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
Our objective is to trainpnorm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efficiently scales to large problems. As a result, it has gained widespread acceptance and SVMs are routinely trained using SMO in diverse real world applications. Training using SMO has been a long standing goal in MKL for the very same reasons. Unfortunately, the standard MKL dual is not differentiable, and therefore can not be optimised using SMO style coordinate ascent. In this paper, we demonstrate that linear MKL regularised with the pnorm squared, or with certain Bregman divergences, can indeed be trained using SMO. The resulting algorithm retains both simplicity and efficiency and is significantly faster than stateoftheart specialisedpnorm MKL solvers. We show that we can train on a hundred thousand kernels in approximately seven minutes and on fifty thousand points in less than half an hour on a single core. 1
Inductive regularized learning of kernel functions
"... In this paper we consider the fundamental problem of semisupervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformatio ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
In this paper we consider the fundamental problem of semisupervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for largescale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semisupervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the kNN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies.
Learning kernels with radiuses of minimum enclosing balls
 NIPS
, 2010
"... In this paper, we point out that there exist scaling and initialization problems in most existing multiple kernel learning (MKL) approaches, which employ the large margin principle to jointly learn both a kernel and an SVM classifier. The reason is that the margin itself can not well describe how go ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we point out that there exist scaling and initialization problems in most existing multiple kernel learning (MKL) approaches, which employ the large margin principle to jointly learn both a kernel and an SVM classifier. The reason is that the margin itself can not well describe how good a kernel is due to the negligence of the scaling. We use the ratio between the margin and the radius of the minimum enclosing ball to measure the goodness of a kernel, and present a new minimization formulation for kernel learning. This formulation is invariant to scalings of learned kernels, and when learning linear combination of basis kernels it is also invariant to scalings of basis kernels and to the types (e.g., L1 or L2) of norm constraints on combination coefficients. We establish the differentiability of our formulation, and propose a gradient projection algorithm for kernel learning. Experiments show that our method significantly outperforms both SVM with the uniform combination of basis kernels and other stateofart MKL approaches.
SPGGMKL: Generalized Multiple Kernel Learning with a Million Kernels
"... Multiple Kernel Learning (MKL) aims to learn the kernel in an SVM from training data. Many MKL formulations have been proposed and some have proved effective in certain applications. Nevertheless, as MKL is a nascent field, many more formulations need to be developed to generalize across domains and ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Multiple Kernel Learning (MKL) aims to learn the kernel in an SVM from training data. Many MKL formulations have been proposed and some have proved effective in certain applications. Nevertheless, as MKL is a nascent field, many more formulations need to be developed to generalize across domains and meet the challenges of real world applications. However, each MKL formulation typically necessitates the development of a specialized optimization algorithm. The lack of an efficient, general purpose optimizer capable of handling a wide range of formulations presents a significant challenge to those looking to take MKL out of the lab and into the real world. This problem was somewhat alleviated by the development of the Generalized Multiple Kernel Learning (GMKL)
Affinity Aggregation for Spectral Clustering
"... Spectral clustering makes use of spectralgraph structure of an affinity matrix to partition data into disjoint meaningful groups. Because of its elegance, efficiency and good performance, spectral clustering has become one of the most popular clustering methods. Traditional spectral clustering assu ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Spectral clustering makes use of spectralgraph structure of an affinity matrix to partition data into disjoint meaningful groups. Because of its elegance, efficiency and good performance, spectral clustering has become one of the most popular clustering methods. Traditional spectral clustering assumes a single affinity matrix. However, in many applications, there could be multiple potentially useful features and thereby multiple affinity matrices. To apply spectral clustering for these cases, a possible way is to aggregate the affinity matrices into a single one. Unfortunately, affinity measures constructed from different features could have different characteristics. Careless aggregation might make even worse clustering performance. This paper proposes an affinity aggregation spectral clustering (AASC) algorithm which extends spectral clustering to a setting with multiple affinities available. AASC seeks for an optimal combination of affinity matrices so that it is more immune to ineffective affinities and irrelevant features. This enables the construction of similarity or distancemetric measures for clustering less crucial. Experiments show that AASC is effective in simultaneous clustering and feature fusion, thus enhancing the performance of spectral clustering by employing multiple affinities. 1.
TwoLayer Multiple Kernel Learning
"... Multiple Kernel Learning (MKL) aims to learn kernel machines for solving a real machine learning problem (e.g. classification) by exploring the combinations of multiple kernels. The traditional MKL approach is in general “shallow ” in the sense that the target kernel is simply a linear (or convex) c ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Multiple Kernel Learning (MKL) aims to learn kernel machines for solving a real machine learning problem (e.g. classification) by exploring the combinations of multiple kernels. The traditional MKL approach is in general “shallow ” in the sense that the target kernel is simply a linear (or convex) combination of some base kernels. In this paper, we investigate a framework of MultiLayer Multiple Kernel Learning (MLMKL) that aims to learn “deep ” kernel machines by exploring the combinations of multiple kernels in a multilayer structure, which goes beyond the conventional MKL approach. Through a multiple layer mapping, the proposed MLMKL framework offers higher flexibility than the regular MKL for finding the optimal kernel for applications. As the first attempt to this new MKL framework, we present a TwoLayer Multiple Kernel Learning (2LMKL) method together with two efficient algorithms for classification tasks. We analyze their generalization performances and have conducted an extensive set of experiments over 16 benchmark datasets, in which encouraging results showed that our method performed better than the conventional MKL methods. 1
Fast learning rate of multiple kernel learning: Tradeoff between sparsity and smoothness. The Annals of Statistics
, 2013
"... We investigate the learning rate of multiple kernel leaning (MKL) with ℓ1 and elasticnet regularizations. The elasticnet regularization is a composition of an ℓ1regularizer for inducing the sparsity and an ℓ2regularizer for controlling the smoothness. We focus on a sparse setting where the total ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We investigate the learning rate of multiple kernel leaning (MKL) with ℓ1 and elasticnet regularizations. The elasticnet regularization is a composition of an ℓ1regularizer for inducing the sparsity and an ℓ2regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large but the number of nonzero components of the ground truth is relatively small, and show sharper convergence rates than the learning rates ever shown for both ℓ1 and elasticnet regularizations. Our analysis shows there appears a tradeoff between the sparsity and the smoothness when it comes to selecting which of ℓ1 and elasticnet regularizations to use; if the ground truth is smooth, the elasticnet regularization is preferred, otherwise the ℓ1 regularization is preferred. 1