Results 1 - 10
of
476
In defense of one-vs-all classification
- Journal of Machine Learning Research
, 2004
"... Editor: John Shawe-Taylor We consider the problem of multiclass classification. Our main thesis is that a simple “one-vs-all ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are well-tuned regularized classifiers such as support vector machines. This the ..."
Abstract
-
Cited by 318 (0 self)
- Add to MetaCart
Editor: John Shawe-Taylor We consider the problem of multiclass classification. Our main thesis is that a simple “one-vs-all ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are well-tuned regularized classifiers such as support vector machines. This thesis is interesting in that it disagrees with a large body of recent published work on multiclass classification. We support our position by means of a critical review of the existing literature, a substantial collection of carefully controlled experimental work, and theoretical arguments.
Proximal support vector machine classifiers
- Proceedings KDD-2001: Knowledge Discovery and Data Mining
, 2001
"... Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the ..."
Abstract
-
Cited by 160 (16 self)
- Add to MetaCart
(Show Context)
Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the other data set. Each of the two nonparallel proximal planes is obtained by a single MATLAB command as the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem. Classification by proximity to two distinct nonlinear surfaces generated by a nonlinear kernel also leads to two simple generalized eigenvalue problems. The effectiveness of the proposed method is demonstrated by tests on simple examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation time and test set correctness. Index Terms—Support vector machines, proximal classification, generalized eigenvalues. 1
A modified finite newton method for fast solution of large scale linear svms
- Journal of Machine Learning Research
, 2005
"... This paper develops a fast method for solving linear SVMs with L2 loss function that is suited for large scale data mining tasks such as text classification. This is done by modifying the finite Newton method of Mangasarian in several ways. Experiments indicate that the method is much faster than de ..."
Abstract
-
Cited by 109 (8 self)
- Add to MetaCart
(Show Context)
This paper develops a fast method for solving linear SVMs with L2 loss function that is suited for large scale data mining tasks such as text classification. This is done by modifying the finite Newton method of Mangasarian in several ways. Experiments indicate that the method is much faster than decomposition methods such as SVM light, SMO and BSVM (e.g., 4-100 fold), especially when the number of examples is large. The paper also suggests ways of extending the method to other loss functions such as the modified Huber’s loss function and the L1 loss function, and also for solving ordinal regression.
Everything Old Is New Again: A Fresh Look at Historical Approaches IN MACHINE LEARNING
, 2002
"... ..."
(Show Context)
A robust minimax approach to classification
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2002
"... When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the class-condi ..."
Abstract
-
Cited by 104 (7 self)
- Add to MetaCart
When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the class-conditional distributions. Misclassification probabilities are then controlled in a worst-case setting: that is, under all possible choices of class-conditional densities with given mean and covariance matrix, we minimize the worst-case (maximum) probability of misclassification of future data points. For a linear decision boundary, this desideratum is translated in a very direct way into a (convex) second order cone optimization problem, with complexity similar to a support vector machine problem. The minimax problem can be interpreted geometrically as minimizing the maximum of the Mahalanobis distances to the two classes. We address the issue of robustness with respect to estimation errors (in the means and covariances of the
Regularized Least-Squares Classification
"... We consider the solution of binary classification problems via Tikhonov regularization in a Reproducing Kernel Hilbert Space using the square loss, and denote the resulting algorithm Regularized Least-Squares Classification (RLSC). We sketch ..."
Abstract
-
Cited by 103 (1 self)
- Add to MetaCart
We consider the solution of binary classification problems via Tikhonov regularization in a Reproducing Kernel Hilbert Space using the square loss, and denote the resulting algorithm Regularized Least-Squares Classification (RLSC). We sketch
Weighted Least Squares Support Vector Machines: robustness and sparse approximation
- NEUROCOMPUTING
"... Least Squares Support Vector Machines (LS-SVM) is an SVM version which involves equality instead of inequality constraints and works with a least squares cost function. In this way the solution follows from a linear Karush-Kuhn-Tucker system instead of a quadratic programming problem. However, sp ..."
Abstract
-
Cited by 97 (19 self)
- Add to MetaCart
(Show Context)
Least Squares Support Vector Machines (LS-SVM) is an SVM version which involves equality instead of inequality constraints and works with a least squares cost function. In this way the solution follows from a linear Karush-Kuhn-Tucker system instead of a quadratic programming problem. However, sparseness is lost in the LS-SVM case and the estimation of the support values is only optimal in the case of a Gaussian distribution of the error variables. In this paper we discuss a method which can overcome these two drawbacks. We show how to obtain robust estimates for regression by applying a weighted version of LS-SVM. We also discuss a sparse approximation procedure for weighted and unweighted LS-SVM. It is basically a pruning method which is able to do pruning based upon the physical meaning of the sorted support values, while pruning procedures for classical multilayer perceptrons require the computation of a Hessian matrix or its inverse. The methods of this paper are illustrated for RBF kernels and demonstrate how to obtain robust estimates with selection of an appropriate number of hidden units, in the case of outliers or nonGaussian error distributions with heavy tails.
Predictive low-rank decomposition for kernel methods
- ICML
, 2005
"... Low-rank matrix decompositions are essential tools in the application of kernel methods to large-scale learning problems. These decompositions have generally been treated as black boxes—the decomposition of the kernel matrix that they deliver is independent of the specific learning task at hand— and ..."
Abstract
-
Cited by 88 (7 self)
- Add to MetaCart
(Show Context)
Low-rank matrix decompositions are essential tools in the application of kernel methods to large-scale learning problems. These decompositions have generally been treated as black boxes—the decomposition of the kernel matrix that they deliver is independent of the specific learning task at hand— and this is a potentially significant source of inefficiency. In this paper, we present an algorithm that can exploit side information (e.g., classification labels, regression responses) in the computation of low-rank decompositions for kernel matrices. Our algorithm has the same favorable scaling as state-of-the-art methods such as incomplete Cholesky decomposition—it is linear in the number of data points and quadratic in the rank of the approximation. We present simulation results that show that our algorithm yields decompositions of significantly smaller rank than those found by incomplete Cholesky decomposition. 1.
Multiclass Least Squares Support Vector Machines
- In Proceedings of the International Joint Conference on Neural Networks (IJCNN’99
, 1999
"... In this paper we present an extension of least squares support vector machines (LS-SVM's) to the multiclass case. While standard SVM solutions involve solving quadratic or linear programming problems, the least squares version of SVM's corresponds to solving a set of linear equations, due ..."
Abstract
-
Cited by 83 (11 self)
- Add to MetaCart
In this paper we present an extension of least squares support vector machines (LS-SVM's) to the multiclass case. While standard SVM solutions involve solving quadratic or linear programming problems, the least squares version of SVM's corresponds to solving a set of linear equations, due to equality instead of inequality constraints in the problem formulation. In LS-SVM's Mercer condition is still applicable. Hence several type of kernels such as polynomial, RBF's and MLP's can be used. The multiclass case that we discuss here is related to classical neural net approaches for classification where multi classes are encoded by considering multiple outputs for the network. Efficient methods for solving large scale LS-SVM's are available. 1 Introduction Support vector machines have been introduced in [16] for solving pattern recognition and nonlinear function estimation problems. In this method one maps the data into a high dimensional input space in which one constructs an optimal sepa...