Results 1 - 10
of
152
A tutorial on support vector machines for pattern recognition
- Data Mining and Knowledge Discovery
, 1998
"... The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SV ..."
Abstract
-
Cited by 3393 (12 self)
- Add to MetaCart
(Show Context)
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract
-
Cited by 865 (3 self)
- Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
On the algorithmic implementation of multi-class kernel-based vector machines
- Journal of Machine Learning Research
"... In this paper we describe the algorithmic implementation of multiclass kernel-based vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic ob ..."
Abstract
-
Cited by 559 (13 self)
- Add to MetaCart
In this paper we describe the algorithmic implementation of multiclass kernel-based vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic objective function. Unlike most of previous approaches which typically decompose a multiclass problem into multiple independent binary classification tasks, our notion of margin yields a direct method for training multiclass predictors. By using the dual of the optimization problem we are able to incorporate kernels with a compact set of constraints and decompose the dual problem into multiple optimization problems of reduced size. We describe an efficient fixed-point algorithm for solving the reduced optimization problems and prove its convergence. We then discuss technical details that yield significant running time improvements for large datasets. Finally, we describe various experiments with our approach comparing it to previously studied kernel-based methods. Our experiments indicate that for multiclass problems we attain state-of-the-art accuracy.
Generalized Discriminant Analysis Using a Kernel Approach
, 2000
"... We present a new method that we call Generalized Discriminant Analysis (GDA) to deal with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the Support Vector Machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high di ..."
Abstract
-
Cited by 336 (2 self)
- Add to MetaCart
We present a new method that we call Generalized Discriminant Analysis (GDA) to deal with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the Support Vector Machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high dimensional feature space. In the transformed space, linear properties make it easy to extend and generalize the classical Linear Discriminant Analysis (LDA) to non linear discriminant analysis. The formulation is expressed as an eigenvalue problem resolution. Using a different kernel, one can cover a wide class of nonlinearities. For both simulated data and alternate kernels, we give classification results as well as the shape of the separating function. The results are confirmed using a real data to perform seed classification. 1. Introduction Linear discriminant analysis (LDA) is a traditional statistical method which has proven successful on classification problems [Fukunaga, 1990]. The p...
Soft Margins for AdaBoost
, 1998
"... Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this ..."
Abstract
-
Cited by 333 (24 self)
- Add to MetaCart
Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this fact is the margin distribution and we find that AdaBoost achieves -- doing gradient descent in an error function with respect to the margin -- asymptotically a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns (here an interesting overlap emerge to Support Vectors). This is clearly a sub-optimal strategy in the noisy case, and regularization, i.e. a mistrust in the data, must be introduced in the algorithm to alleviate the distortions that a difficult pattern (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original AdaBoost algorithm to achieve a soft margin -- a ...
Leave-One-Out Support Vector Machines
, 1999
"... We present a new learning algorithm for pattern recognition inspired by a recent upper bound on leave--one--out error [ Jaakkola and Haussler, 1999 ] proved for Support Vector Machines (SVMs) [ Vapnik, 1995; 1998 ] . The new approach directly minimizes the expression given by the bound in an attempt ..."
Abstract
-
Cited by 301 (5 self)
- Add to MetaCart
We present a new learning algorithm for pattern recognition inspired by a recent upper bound on leave--one--out error [ Jaakkola and Haussler, 1999 ] proved for Support Vector Machines (SVMs) [ Vapnik, 1995; 1998 ] . The new approach directly minimizes the expression given by the bound in an attempt to minimize leave--one--out error. This gives a convex optimization problem which constructs a sparse linear classifier in feature space using the kernel technique. As such the algorithm possesses many of the same properties as SVMs. The main novelty of the algorithm is that apart from the choice of kernel, it is parameterless -- the selection of the number of training errors is inherent in the algorithm and not chosen by an extra free parameter as in SVMs. First experiments using the method on benchmark datasets from the UCI repository show results similar to SVMs which have been tuned to have the best choice of parameter. 1 Introduction Support Vector Machines (SVMs), motivated by minim...
Training Invariant Support Vector Machines
, 2002
"... Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide ..."
Abstract
-
Cited by 186 (16 self)
- Add to MetaCart
Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.
The connection between regularization operators and support vector kernels
, 1998
"... In this paper a correspondence is derived between regularization operators used in regularization networks and support vector kernels. We prove that the Green’s Functions associated with regularization operators are suitable support vector kernels with equivalent regularization properties. Moreover, ..."
Abstract
-
Cited by 175 (40 self)
- Add to MetaCart
In this paper a correspondence is derived between regularization operators used in regularization networks and support vector kernels. We prove that the Green’s Functions associated with regularization operators are suitable support vector kernels with equivalent regularization properties. Moreover, the paper provides an analysis of currently used support vector kernels in the view of regularization theory and corresponding operators associated with the classes of both polynomial kernels and translation invariant kernels. The latter are also analyzed on periodical domains. As a by-product we show that a large number of radial basis functions, namely conditionally positive definite
Face recognition using kernel direct discriminant analysis algorithms
- IEEE Trans. Neural Networks
"... Abstract—Techniques that can introduce low-dimensional feature representation with enhanced discriminatory power is of paramount importance in face recognition (FR) systems. It is well known that the distribution of face images, under a perceivable variation in viewpoint, illumination or facial expr ..."
Abstract
-
Cited by 143 (12 self)
- Add to MetaCart
(Show Context)
Abstract—Techniques that can introduce low-dimensional feature representation with enhanced discriminatory power is of paramount importance in face recognition (FR) systems. It is well known that the distribution of face images, under a perceivable variation in viewpoint, illumination or facial expression, is highly nonlinear and complex. It is, therefore, not surprising that linear techniques, such as those based on principle component analysis (PCA) or linear discriminant analysis (LDA), cannot provide reliable and robust solutions to those FR problems with complex face variations. In this paper, we propose a kernel machine-based discriminant analysis method, which deals with the nonlinearity of the face patterns ’ distribution. The proposed method also effectively solves the so-called “small sample size ” (SSS) problem, which exists in most FR tasks. The new algorithm has been tested, in terms of classification error rate performance, on the multiview UMIST face database. Results indicate that the proposed methodology is able to achieve excellent performance with only a very small set of features being used, and its error rate is approximately 34 % and 48 % of those of two other commonly used kernel FR approaches, the kernel-PCA (KPCA) and the generalized discriminant analysis (GDA), respectively. Index Terms—Face recognition (FR), kernel direct discriminant analysis (KDDA), linear discriminant analysis (LDA), principle component analysis (PCA), small sample size problem (SSS), kernel methods. I.