Results 21  30
of
468
Generalized Kernel Approach to Dissimilaritybased Classification
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2001
"... Usually, objects to be classified are represented by features. In this paper, we discuss an alternative object representation based on dissimilarity values. If such distances separate the classes well, the nearest neighbor method offers a good solution. However, dissimilarities used in practice are ..."
Abstract

Cited by 87 (2 self)
 Add to MetaCart
Usually, objects to be classified are represented by features. In this paper, we discuss an alternative object representation based on dissimilarity values. If such distances separate the classes well, the nearest neighbor method offers a good solution. However, dissimilarities used in practice are usually far from ideal and the performance of the nearest neighbor rule suffers from its sensitivity to noisy examples. We show that other, more global classification techniques are preferable to the nearest neighbor rule, in such cases. For classification purposes, two different ways of using generalized dissimilarity kernels are considered. In the first one, distances are isometrically embedded in a pseudoEuclidean space and the classification task is performed there. In the second approach, classifiers are built directly on distance kernels. Both approaches are described theoretically and then compared using experiments with different dissimilarity measures and datasets including degraded data simulating the problem of missing values.
Duality and Geometry in SVM Classifiers
 In Proc. 17th International Conf. on Machine Learning
, 2000
"... We develop an intuitive geometric interpretation of the standard support vector machine (SVM) for classification of both linearly separable and inseparable data and provide a rigorous derivation of the concepts behind the geometry. For the separable case finding the maximum margin between the ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
(Show Context)
We develop an intuitive geometric interpretation of the standard support vector machine (SVM) for classification of both linearly separable and inseparable data and provide a rigorous derivation of the concepts behind the geometry. For the separable case finding the maximum margin between the two sets is equivalent to finding the closest points in the smallest convex sets that contain each class (the convex hulls). We now extend this argument to the inseparable case by using a reduced convex hull reduced away from outliers. We prove that solving the reduced convex hull formulation is exactly equivalent to solving the standard inseparable SVM for appropriate choices of parameters. Some additional advantages of the new formulation are that the e#ect of the choice of parameters becomes geometrically clear and that the formulation may be solved by fast nearest point algorithms. By changing norms these arguments hold for both the standard 2norm and 1norm SVM. 1. Int...
An Online Kernel Change Detection Algorithm
, 2004
"... A number of abrupt change detection methods have been proposed in the past, among which are efficient modelbased techniques such as the Generalized Likelihood Ratio (GLR) test. We consider the case where no accurate nor tractable model can be found, using a modelfree approach, called Kernel chang ..."
Abstract

Cited by 76 (12 self)
 Add to MetaCart
A number of abrupt change detection methods have been proposed in the past, among which are efficient modelbased techniques such as the Generalized Likelihood Ratio (GLR) test. We consider the case where no accurate nor tractable model can be found, using a modelfree approach, called Kernel change detection (KCD). KCD compares two sets of descriptors extracted online from the signal at each time instant: the immediate past set and the immediate future set. Based on the soft margin singleclass Support Vector Machine (SVM), we build a dissimilarity measure in feature space between those sets, without estimating densities as an intermediary step. This dissimilarity measure is shown to be asymptotically equivalent to the Fisher ratio in the Gaussian case. Implementation issues are addressed, in particular, the dissimilarity measure can be computed online in input space. Simulation results on both synthetic signals and real music signals show the efficiency of KCD.
Learning with idealized kernels
 In Proceedings of the 20th International Conference on Machine Learning
, 2003
"... The kernel function plays a central role in kernel methods. Existing methods typically fix the functional form of the kernel in advance and then only adapt the associated kernel parameters based on empirical data. In this paper, we consider the problem of adapting the kernel so that it becomes mo ..."
Abstract

Cited by 68 (6 self)
 Add to MetaCart
The kernel function plays a central role in kernel methods. Existing methods typically fix the functional form of the kernel in advance and then only adapt the associated kernel parameters based on empirical data. In this paper, we consider the problem of adapting the kernel so that it becomes more similar to the socalled ideal kernel. We formulate this as a distance metric learning problem that searches for a suitable linear transform (fcature weighting) in the kernelinduced feature space. This formulation is applicable even when the training set can only provide examples of similar and dissimilar pairs, but not explicit class label information. Computationally, this leads to a localoptimafree quadratic programming problem, with the number of variables independent of the number of features. Performance of this method is evaluated on classification and clustering tasks on both toy and realworld data sets. 1.
Nonparametric quantile estimation
, 2006
"... In regression, the desired estimate of yx is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of yx, will be below the estimate. For τ = 0.5 this is an estimate of the median. What ..."
Abstract

Cited by 55 (9 self)
 Add to MetaCart
In regression, the desired estimate of yx is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of yx, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. 1.
Classification on proximity data with lp–machines
, 1999
"... We provide a new linear program to deal with classification of data in the case of functions written in terms of pairwise proximities. This allows to avoid the problems inherent in using feature spaces with indefinite metric in Support Vector Machines, since the notion of a margin is purely needed i ..."
Abstract

Cited by 47 (10 self)
 Add to MetaCart
(Show Context)
We provide a new linear program to deal with classification of data in the case of functions written in terms of pairwise proximities. This allows to avoid the problems inherent in using feature spaces with indefinite metric in Support Vector Machines, since the notion of a margin is purely needed in input space where the classification actually occurs. Moreover in our approach we can enforce sparsity in the proximity representation by sacrificing training error. This turns out to be favorable for proximity data. Similar to –SV methods, the only parameter needed in the algorithm is the (asymptotical) number of data points being classified with a margin. Finally, the algorithm is successfully compared with –SV learning in proximity space and K–nearestneighbors on real world data from Neuroscience and molecular biology. 1
Most likely heteroscedastic gaussian process regression
 In International Conference on Machine Learning (ICML
, 2007
"... This paper presents a novel Gaussian process (GP) approach to regression with inputdependent noise rates. We follow Goldberg et al.’s approach and model the noise variance using a second GP in addition to the GP governing the noisefree output value. In contrast to Goldberg et al., however, we do ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
This paper presents a novel Gaussian process (GP) approach to regression with inputdependent noise rates. We follow Goldberg et al.’s approach and model the noise variance using a second GP in addition to the GP governing the noisefree output value. In contrast to Goldberg et al., however, we do not use a Markov chain Monte Carlo method to approximate the posterior noise variance but a most likely noise approach. The resulting model is easy to implement and can directly be used in combination with various existing extensions of the standard GPs such as sparse approximations. Extensive experiments on both synthetic and realworld data, including a challenging perception problem in robotics, show the effectiveness of most likely heteroscedastic GP regression. 1.
A novel logbased relevance feedback technique in contentbased image retrieval
, 2004
"... Relevance feedback has been proposed as an important technique to boost the retrieval performance in contentbased image retrieval (CBIR). However, since there exists a semantic gap between lowlevel features and highlevel semantic concepts in CBIR, typical relevance feedback techniques need to per ..."
Abstract

Cited by 44 (12 self)
 Add to MetaCart
(Show Context)
Relevance feedback has been proposed as an important technique to boost the retrieval performance in contentbased image retrieval (CBIR). However, since there exists a semantic gap between lowlevel features and highlevel semantic concepts in CBIR, typical relevance feedback techniques need to perform a lot of rounds of feedback for achieving satisfactory results. These procedures are timeconsuming and may make the users bored in the retrieval tasks. For a longterm study purpose in CBIR, we notice that the users’ feedback logs can be available and employed for helping the retrieval tasks in CBIR systems. In this paper, we propose a novel scheme to study the logbased relevance feedback (LRF) technique for improving retrieval performance and reducing the semantic gap in CBIR. In order to effectively incorporate the users ’ feedback logs, we propose a modified support vector machine (SVM) technique called soft label support vector machine (SLSVM) to construct the LRF algorithm in CBIR. We conduct extensive experiments to evaluate the performance of our proposed algorithm. Compared with the typical approach using query expansion (QEX) technique, we demonstrate that our proposed scheme can significantly improve the retrieval performance of semantic image retrieval from detailed experiments.
Kernel Methods: A Survey of Current Techniques
 Neurocomputing
, 2000
"... : Kernel Methods have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. They exhibit good generalisation performance on many reallife datasets and the approach is properly motivated theoretically. There are relatively few free ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
(Show Context)
: Kernel Methods have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. They exhibit good generalisation performance on many reallife datasets and the approach is properly motivated theoretically. There are relatively few free parameters to adjust and the architecture of the learning machine does not need to be found by experimentation. In this tutorial we survey this subject with a principal focus on the most wellknown models based on kernel substitution, namely, Support Vector Machines. 1 Introduction. Support Vector Machines (SVMs) have been successfully applied to a number of applications ranging from particle identification, face identification and text categorisation to engine knock detection, bioinformatics and database marketing [9]. The approach is systematic and properly motivated by statistical learning theory [42]. Training involves optimisation of a convex cost function: there are no false local mi...
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.