Results 1 - 10
of
12
Bayesian methods for support vector machines: Evidence and predictive class probabilities
- Machine Learning
, 2002
"... Abstract. I describe a framework for interpreting Support Vector Machines (SVMs) as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors. This probabilistic interpretation can provide intuitive guidelines for choosing a ‘good ’ SVM kernel. Beyond this, it allows Ba ..."
Abstract
-
Cited by 40 (3 self)
- Add to MetaCart
Abstract. I describe a framework for interpreting Support Vector Machines (SVMs) as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors. This probabilistic interpretation can provide intuitive guidelines for choosing a ‘good ’ SVM kernel. Beyond this, it allows Bayesian methods to be used for tackling two of the outstanding challenges in SVM classification: how to tune hyperparameters—the misclassification penalty C, and any parameters specifying the kernel—and how to obtain predictive class probabilities rather than the conventional deterministic class label predictions. Hyperparameters can be set by maximizing the evidence; I explain how the latter can be defined and properly normalized. Both analytical approximations and numerical methods (Monte Carlo chaining) for estimating the evidence are discussed. I also compare different methods of estimating class probabilities, ranging from simple evaluation at the MAP or at the posterior average to full averaging over the posterior. A simple toy application illustrates the various concepts and techniques.
Probabilistic methods for Support Vector Machines
- Advances in Neural Information Processing Systems 12
, 2000
"... I describe a framework for interpreting Support Vector Machines (SVMs) as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors. This can provide intuitive guidelines for choosing a `good' SVM kernel. It can also assign (by evidence maximization) optimal values ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
I describe a framework for interpreting Support Vector Machines (SVMs) as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors. This can provide intuitive guidelines for choosing a `good' SVM kernel. It can also assign (by evidence maximization) optimal values to parameters such as the noise level C which cannot be determined unambiguously from properties of the MAP solution alone (such as cross-validation error) . I illustrate this using a simple approximate expression for the SVM evidence. Once C has been determined, error bars on SVM predictions can also be obtained. 1 Support Vector Machines: A probabilistic framework Support Vector Machines (SVMs) have recently been the subject of intense research activity within the neural networks community; for tutorial introductions and overviews of recent developments see [1, 2, 3]. One of the open questions that remains is how to set the `tunable' parameters of an SVM algorithm: While methods for...
Model Selection for Support Vector Machine Classification
- Neurocomputing
, 2002
"... We address the problem of model selection for Support Vector Machine (SVM) classification. For fixed functional form of the kernel, model selection amounts to tuning kernel parameters and the slack penalty coefficient C. We begin by reviewing a recently developed probabilistic framework for SVM ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
We address the problem of model selection for Support Vector Machine (SVM) classification. For fixed functional form of the kernel, model selection amounts to tuning kernel parameters and the slack penalty coefficient C. We begin by reviewing a recently developed probabilistic framework for SVM classification. An extension to the case of SVMs with quadratic slack penalties is given and a simple approximation for the evidence is derived, which can be used as a criterion for model selection. We also derive the exact gradients of the evidence in terms of posterior averages and describe how they can be estimated numerically using Hybrid Monte Carlo techniques. Though computationally demanding, the resulting gradient ascent algorithm is a useful baseline tool for probabilistic SVM model selection, since it can locate maxima of the exact (unapproximated) evidence. We then perform extensive experiments on several benchmark data sets. The aim of these experiments is to compare the performance of probabilistic model selection criteria with alternatives based on estimates of the test error, namely the so-called "span estimate" and Wahba's Generalized Approximate Cross-Validation (GACV) error. We find that all the "simple" model criteria (Laplace evidence approximations, and the Span and GACV error estimates) exhibit multiple local optima with respect to the hyperparameters. While some of these give performance that is competitive with results from other approaches in the literature, a significant fraction lead to rather higher test errors.
Towards Scalable Support Vector Machines using Squashing
- In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... Support vector machines (SVMs) provide classification models with strong theoretical foundations as well as excellent empirical performance on a variety of applications. One of the major drawbacks of SVMs is the necessity to solve a large-scale quadratic programming problem. This paper combines like ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Support vector machines (SVMs) provide classification models with strong theoretical foundations as well as excellent empirical performance on a variety of applications. One of the major drawbacks of SVMs is the necessity to solve a large-scale quadratic programming problem. This paper combines likelihood-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets. We reduce the problem of training the SVMs on the weighted "squashed" data to a quadratic programming problem and show that it can be solved using Platt's sequential minimal optimization (SMO) algorithm. We compare performance of the SMO algorithm on the squashed and the full data, as well as on simple random and boosted samples of the data. Experiments on a number of datasets show that squashing allows one to speed-up training, decrease memory requirements, and obtain parameter estimates close to that of the full data. More importantly, squashing produces close to optimal classific...
A probabilistic framework for SVM regression and error bar estimation
- Machine Learning
, 2002
"... In this paper, we elaborate on the well-known relationship between Gaussian Processes (GP) and Support Vector Machines (SVM) under some convex assumptions for the loss functions. This paper concentrates on the derivation of the evidence and error bar approximation for regression problems. An error b ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In this paper, we elaborate on the well-known relationship between Gaussian Processes (GP) and Support Vector Machines (SVM) under some convex assumptions for the loss functions. This paper concentrates on the derivation of the evidence and error bar approximation for regression problems. An error bar formula is derived based on the ɛ-insensitive loss function.
Extensions of the informative vector machine
- In
, 2005
"... Abstract The informative vector machine (IVM) is a practical method for Gaussian process regression and classification. The IVM produces a sparse approximation to a Gaussian process by combining assumed density filtering with a heuristic for choosing points based on minimizing posterior entropy. Thi ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract The informative vector machine (IVM) is a practical method for Gaussian process regression and classification. The IVM produces a sparse approximation to a Gaussian process by combining assumed density filtering with a heuristic for choosing points based on minimizing posterior entropy. This paper extends IVM in several ways. First, we propose a novel noise model that allows the IVM to be applied to a mixture of labeled and unlabeled data. Second, we use IVM on a blockdiagonal covariance matrix, for “learning to learn ” from related tasks. Third, we modify the IVM to incorporate prior knowledge from known invariances. All of these extensions are tested on artificial and real data. 1
Approximate analytical bootstrap averages for support vector classifiers
- Advances in Neural Information Processing Systems 16
, 2004
"... We compute approximate analytical bootstrap averages for support vector classification using a combination of the replica method of statistical physics and the TAP approach for approximate inference. We test our method on a few datasets and compare it with exact averages obtained by extensive Monte- ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We compute approximate analytical bootstrap averages for support vector classification using a combination of the replica method of statistical physics and the TAP approach for approximate inference. We test our method on a few datasets and compare it with exact averages obtained by extensive Monte-Carlo sampling. 1
Active learning with Support Vector Machines
, 2004
"... This thesis examines the use of support vector machines for active learning using linear, poly-nomial and radial basis function kernels. In our experiments we used named entity recognition which was treated as a binary task and as a multiclass task and we also tackled shallow parsing. We report savi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This thesis examines the use of support vector machines for active learning using linear, poly-nomial and radial basis function kernels. In our experiments we used named entity recognition which was treated as a binary task and as a multiclass task and we also tackled shallow parsing. We report savings in annotation costs ranging from 80 % to 95 % depending on the task. We observed that the distribution of labels in the selected instances during active learning could provide us with a stopping criterion in cases where one class can be considered to be the ma-jority class of the dataset. Finally, using the confidence estimation of the SVM classifier, we define a stopping criterion that appears to be efficient in all our active learning experiments. i Acknowledgements I would like to thank my supervisor, Miles Osborne, who guided me throughout this task. I am obliged to Chih-Jen Lin for his help in tuning LIBSVM. Many thanks to Andrew and Christophoros for their help and insight in maths, as well as proofreading my thesis. Many thanks to Beatrice, Ben, Marcus and Shipra for their help in the parallel experiments we ran.
A Smoothing Kernel for Spatially Related Features and Its Application to Speaker Verification
"... Most commonly used kernels are invariant to permutations of the feature vector components. This characteristic may make machine learning methods that use such kernels suboptimal in cases where the feature vector has an underlying structure. In this paper we will consider one such case, where the fea ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Most commonly used kernels are invariant to permutations of the feature vector components. This characteristic may make machine learning methods that use such kernels suboptimal in cases where the feature vector has an underlying structure. In this paper we will consider one such case, where the features are spatially related. We show a way to modify the objective function of the support vector machine (SVM) optimization problem to account for this structure. The new optimization problem can be implemented as a standard SVM using a particular smoothing kernel. Results are shown on a speaker verification task using prosodic features that are transformed using a particular implementation of the Fisher score. The proposed method leads to improvements of as much as 15 % in equal error rate (EER).
An Uncertainty Framework for Classification
, 2000
"... We define a generalized likelihood function based on uncertainty measures and show that maximizing such a likelihood function for different measures induces different types of classifiers. In the probabilistic framework, we obtain classifiers that optimize the cross-entropy function. In the po ..."
Abstract
- Add to MetaCart
We define a generalized likelihood function based on uncertainty measures and show that maximizing such a likelihood function for different measures induces different types of classifiers. In the probabilistic framework, we obtain classifiers that optimize the cross-entropy function. In the possibilistic framework, we obtain classifiers that maximize the interclass margin. Furthermore, we show that the support vector machine is a sub-class of these maximum- margin classifiers.

