Abstract. In Support Vector Machines (SVMs), the solution of the classification problem is characterized by a (convex) quadratic programming (QP) problem. In a modified version of SVMs, called Least Squares SVM classifiers (LS-SVMs), a least squares cost function is proposed so as to obtain a linear set of equations in the dual space. While the SVM classifier has a large margin interpretation, the LS-SVM formulation is related in this paper to a ridge regression approach for classification with binary targets and to Fisher's linear discriminant analysis in the feature space. Multiclass categorization problems are represented by a set of binary classifiers using different output coding schemes. While regularization is used to control the effective number of parameters of the LS-SVM classifier, the sparseness property of SVMs is lost due to the choice of the 2-norm. Sparseness can be imposed in a second stage by gradually pruning the support value spectrum and optimizing the hyperparameters during the sparse approximation procedure. In this paper, twenty public domain benchmark datasets are used to evaluate the test set performance of LS-SVM classifiers with linear, polynomial and radial basis
|
4514
|
Statistical Learning Theory
– Vapnik
- 1998
|
|
3214
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
3051
|
Neural Networks for Pattern Recognition
– Bishop
- 1995
|
|
2961
|
Pattern Classification and Scene Analysis
– Duda, Hart
- 1973
|
|
2438
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
2138
|
UCI Repository of Machine Learning Databases
– Merz, Murphy
- 1996
|
|
1117
|
E.: Data Mining: Practical machine learning tools and techniques. 2nd edn
– Witten, Frank
- 2005
|
|
961
|
Learning with Kernels
– Schölkopf, Smola
- 2002
|
|
792
|
Instance-Based Learning Algorithms
– Kibler
- 1991
|
|
688
|
A training algorithm for optimal margin classifiers
– Boser, Guyon, et al.
- 1992
|
|
630
|
An introduction to Support Vector Machines and other Kernel-based learning methods
– Cristianini, Shawe-Taylor
- 2000
|
|
536
|
An Introduction to Support Vector Machines
– Cristianini, Shawe-Taylor
- 2000
|
|
496
|
The Use of Multiple Measurements in Taxonomic Problems
– Fisher
- 1936
|
|
495
|
Training of Support Vector Machine using Sequential Minimal Optimization
– Platt
- 1999
|
|
401
|
Parallel networks that learn to pronounce english text
– Sejnowski, Rosenberg
- 1987
|
|
350
|
Optimal brain damage
– Cun, Denker, et al.
- 1990
|
|
337
|
Solving multiclass learning problems via error-correcting output codes
– Dietterich, Bakiri
- 1995
|
|
335
|
Very simple classification rules perform well on most commonly used data sets
– Holte
- 1993
|
|
314
|
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
– Dietterich
- 1998
|
|
230
|
Reducing multiclass to binary: A unifying approach for margin classifiers
– Allwein, Schapire, et al.
|
|
204
|
Sparse Bayesian learning and the relevance vector machine
– Tipping
|
|
164
|
Estimating continuous distributions in Bayesian classifier
– John, Langley
- 1995
|
|
154
|
discriminant analysis with kernels
– Mika, Weston, et al.
- 1999
|
|
151
|
Classifcation by pairwise coupling
– Hastie, Tibshirani
- 1996
|
|
149
|
An equivalence between sparse approximation and SupportVector Machines
– Girosi
- 1998
|
|
148
|
order derivatives for network pruning: optimal brain surgeon
– Hassibi, Stork, et al.
- 1993
|
|
145
|
Regularized discriminant analysis
– Friedman
- 1989
|
|
137
|
Prediction with Gaussian processes: From linear regression to linear prediction and beyond
– Williams
- 1997
|
|
113
|
Feature selection via concave minimization and support vector machines
– Bradley, Mangasarian
- 1998
|
|
104
|
The connection between regularization operators and support vector kernels. Neural Networks
– Smola, Schólkopf, et al.
- 1998
|
|
100
|
Generalized discriminant analysis using a kernel approach
– Baudat, Anouar
|
|
100
|
Probable networks and plausible predictions — a review of practical Bayesian methods for supervised neural networks
– MacKay
- 1995
|
|
94
|
Y.: A comparison of prediction accuracy, complexity, and training time for thirtythree old and new classification algorithms. Machine Learning 40
– Lim, Loh, et al.
- 1995
|
|
91
|
Input space vs. feature space in kernel-based methods
– Scholkopf, Mika, et al.
- 1999
|
|
72
|
Unifying instance-based and rule-based induction
– Domingos
- 1996
|
|
63
|
Ridge Regression Learning Algorithm in Dual Variables
– Saunders, Gammerman, et al.
- 1998
|
|
34
|
Weighted least squares support vector machines: robustness and sparse approximation”, Neurocomputing
– Suykens, Brabanter, et al.
- 2002
|
|
33
|
Nonparametric Functional Estimation
– Rao, S
- 1983
|
|
26
|
The support vector method of function estimation
– Vapnik
- 1998
|
|
24
|
The evidence framework applied to support vector machines
– Kwok
- 2000
|
|
21
|
Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers
– Schoelkopf, Sung, et al.
- 1997
|
|
19
|
Nearest neighbor classification from multiple feature sets
– Bay
- 1999
|
|
19
|
Least Squares Support Vector Machines
– Suykens, Gestel, et al.
|
|
18
|
Bayesian framework for least squares support vector machine classifiers, Gaussian processes and kernel Fisher discriminant analysis, Neural Computation
– Gestel, Suykens, et al.
- 2002
|
|
15
|
Nonlinear Modeling: Advanced Black-Box Techniques
– Suykens, Vandewalle
- 1998
|
|
12
|
unknown title
– Cawley
- 2000
|
|
10
|
Probability and Statistics (Second Edition
– DeGroot
- 1986
|
|
10
|
The nature of statistical learning theory
– V
- 1995
|
|
8
|
Vovk V., "Ridge Regression Learning Algorithm in Dual Variables
– Saunders, Gammerman
- 1998
|
|
7
|
Pattern Classification and Neural Networks
– Ripley
- 1996
|