Results 1  10
of
25
Multiple kernel learning algorithms
 JMLR
, 2011
"... In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subs ..."
Abstract

Cited by 109 (1 self)
 Add to MetaCart
(Show Context)
In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or datadependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.
ℓpnorm multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2011
"... Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability and scalability. Unfortunately, thisℓ1norm MKL is rarely obser ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability and scalability. Unfortunately, thisℓ1norm MKL is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we extend MKL to arbitrary norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary norms, that isℓpnorms with p≥1. This interleaved optimization is much faster than the commonly used wrapper approaches, as demonstrated on several data sets. A theoretical analysis and an experiment on controlled artificial data shed light on the appropriateness of sparse, nonsparse and ℓ∞norm MKL in various scenarios. Importantly, empirical applications of ℓpnorm MKL to three realworld problems from computational biology show that nonsparse MKL achieves accuracies that surpass the stateoftheart. Data sets, source code to reproduce the experiments, implementations of the algorithms, and
Optimal kernel choice for largescale twosample tests
 Advances in Neural Information Processing Systems
, 2012
"... Given samples from distributions p and q, a twosample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Given samples from distributions p and q, a twosample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the twosample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics. 1
A Family of Simple NonParametric Kernel Learning Algorithms
"... Previous studies of NonParametric Kernel Learning (NPKL) usually formulate the learning task as a SemiDefinite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver cou ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Previous studies of NonParametric Kernel Learning (NPKL) usually formulate the learning task as a SemiDefinite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed “SimpleNPKL”, which can learn nonparametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closedform solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be reformulated as a saddlepoint optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding.
TwoLayer Multiple Kernel Learning
"... Multiple Kernel Learning (MKL) aims to learn kernel machines for solving a real machine learning problem (e.g. classification) by exploring the combinations of multiple kernels. The traditional MKL approach is in general “shallow ” in the sense that the target kernel is simply a linear (or convex) c ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Multiple Kernel Learning (MKL) aims to learn kernel machines for solving a real machine learning problem (e.g. classification) by exploring the combinations of multiple kernels. The traditional MKL approach is in general “shallow ” in the sense that the target kernel is simply a linear (or convex) combination of some base kernels. In this paper, we investigate a framework of MultiLayer Multiple Kernel Learning (MLMKL) that aims to learn “deep ” kernel machines by exploring the combinations of multiple kernels in a multilayer structure, which goes beyond the conventional MKL approach. Through a multiple layer mapping, the proposed MLMKL framework offers higher flexibility than the regular MKL for finding the optimal kernel for applications. As the first attempt to this new MKL framework, we present a TwoLayer Multiple Kernel Learning (2LMKL) method together with two efficient algorithms for classification tasks. We analyze their generalization performances and have conducted an extensive set of experiments over 16 benchmark datasets, in which encouraging results showed that our method performed better than the conventional MKL methods. 1
Improving Web Image Search by Bagbased Reranking
, 2011
"... Given a textual query in traditional textbased image retrieval (TBIR), relevant images are to be reranked using visual features after the initial textbased search. In this paper, we propose a new bagbased reranking framework for largescale TBIR. Specifically, we first cluster relevant images us ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Given a textual query in traditional textbased image retrieval (TBIR), relevant images are to be reranked using visual features after the initial textbased search. In this paper, we propose a new bagbased reranking framework for largescale TBIR. Specifically, we first cluster relevant images using both textual and visual features. By treating each cluster as a “bag ” and the images in the bag as “instances, ” we formulate this problem as a multiinstance (MI) learning problem. MI learning methods such as miSVM can be readily incorporated into our bagbased reranking framework. Observing that at least a certain portion of a positive bag is of positive instances while a negative bag might also contain positive instances, we further use a more suitable generalized MI (GMI) setting for this application. To address the ambiguities on the instance labels in the positive and negative bags under this GMI setting, we develop a new method referred to as GMISVM to enhance retrieval performance by propagating the labels from the bag level to the instance level. To acquire bag annotations for (G)MI learning, we propose a bag ranking method to rank all the bags according to the defined bag ranking score. The top ranked bags are used as pseudopositive training bags, while pseudonegative training bags can be obtained by randomly sampling a few irrelevant images that are not associated with the textual query. Comprehensive experiments on the challenging realworld data set NUSWIDE demonstrate our framework with automatic bag annotation can achieve the best performances compared with existing image reranking methods. Our experiments also demonstrate that GMISVM can achieve better performances when using the manually labeled training bags obtained from relevance feedback.
Multiple Kernel Learning for Visual Object Recognition: A Review
, 2013
"... Multiple kernel learning (MKL) is a principled approach for selecting and combining kernels for a given recognition task. A number of studies have shown that MKL is a useful tool for object recognition, where each image is represented by multiple sets of features and MKL is applied to combine differ ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Multiple kernel learning (MKL) is a principled approach for selecting and combining kernels for a given recognition task. A number of studies have shown that MKL is a useful tool for object recognition, where each image is represented by multiple sets of features and MKL is applied to combine different feature sets. We review the stateoftheart for MKL, including different formulations and algorithms for solving the related optimization problems, with the focus on their applications to object recognition. One dilemma faced by practitioners interested in using MKL for object recognition is that different studies often provide conflicting results about the effectiveness and efficiency of MKL. To resolve this, we conduct extensive experiments on standard datasets to evaluate various approaches to MKL for object recognition. We argue that the seemingly contradictory conclusions offered by studies are due to different experimental setups. The conclusions of our study are, (i) given a sufficient number of training examples and feature/kernel types, MKL is more effective for object recognition than simple kernel combination (e.g., choosing the best performing kernel or average of kernels), and (ii) among the various approaches proposed for MKL, the sequential minimal optimization, semiinfinite programming, and level method based ones are computationally most efficient.
Wavelet Kernel Learning
"... This paperaddressesthe problem ofoptimal featureextractionfrom a waveletrepresentation. Our workaims at building features by selecting waveletcoefficients resulting from signalorimage decomposition on an adapted wavelet basis. For this purpose, we jointly learn in a kernelized largemargin context t ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
This paperaddressesthe problem ofoptimal featureextractionfrom a waveletrepresentation. Our workaims at building features by selecting waveletcoefficients resulting from signalorimage decomposition on an adapted wavelet basis. For this purpose, we jointly learn in a kernelized largemargin context the wavelet shape as well as the appropriate scale and translation of the wavelets, hence the name “wavelet kernel learning”. This problem is posed as a multiple kernel learning problem where the number of kernels can be very large. For solving such a problem, we introduce a novel multiple kernel learning algorithm based on active constraints methods. We furthermore propose some variants of this algorithm that can produce approximate solutions more efficiently. Empirical analysis show that our active constraint MKL algorithm achieves stateofthe art efficiency. When used for wavelet kernel learning, our experimental results show that the approaches we propose are competitive with respect to the state of the art on BrainComputer Interface and Brodatz texture datasets.
Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection
"... In recent times, many real world applications have emerged that require estimates of class ratios in an unlabeled instance collection as opposed to labels of individual instances in the collection. In this paper we investigate the use of maximum mean discrepancy (MMD) in a reproducing kernel Hil ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
In recent times, many real world applications have emerged that require estimates of class ratios in an unlabeled instance collection as opposed to labels of individual instances in the collection. In this paper we investigate the use of maximum mean discrepancy (MMD) in a reproducing kernel Hilbert space (RKHS) for estimating such ratios. First, we theoretically analyze the MMDbased estimates. Our analysis establishes that, under some mild conditions, the estimate is statistically consistent. More importantly, it provides an upper bound on the error in the estimate in terms of intuitive geometric quantities like class separation and data spread. Next, we use the insights obtained from the theoretical analysis, to propose a novel convex formulation that automatically learns the kernel to be employed in the MMDbased estimation. We design an efficient cutting plane algorithm for solving this formulation. Finally, we empirically compare our estimator with several existing methods, and show significantly improved performance under varying datasets, class ratios, and training sizes. 1.