Results 1  10
of
30
Large Margin Classification Using the Perceptron Algorithm
 Machine Learning
, 1998
"... We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leaveoneout method. Like Vapnik 's maximalmargin classifier, our algorithm takes advantage of data that are linearly separable with large ..."
Abstract

Cited by 518 (2 self)
 Add to MetaCart
(Show Context)
We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leaveoneout method. Like Vapnik 's maximalmargin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much simpler to implement, and much more efficient in terms of computation time. We also show that our algorithm can be efficiently used in very high dimensional spaces using kernel functions. We performed some experiments using our algorithm, and some variants of it, for classifying images of handwritten digits. The performance of our algorithm is close to, but not as good as, the performance of maximalmargin classifiers on the same problem, while saving significantly on computation time and programming effort. 1 Introduction One of the most influential developments in the theory of machine learning in the last few years is Vapnik's work on supp...
Ultraconservative Online Algorithms for Multiclass Problems
 Journal of Machine Learning Research
, 2001
"... In this paper we study online classification algorithms for multiclass problems in the mistake bound model. The hypotheses we use maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarityscore between each prototype and the input instance and th ..."
Abstract

Cited by 313 (21 self)
 Add to MetaCart
(Show Context)
In this paper we study online classification algorithms for multiclass problems in the mistake bound model. The hypotheses we use maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarityscore between each prototype and the input instance and then sets the predicted label to be the index of the prototype achieving the highest similarity. To design and analyze the learning algorithms in this paper we introduce the notion of ultraconservativeness. Ultraconservative algorithms are algorithms that update only the prototypes attaining similarityscores which are higher than the score of the correct label's prototype. We start by describing a family of additive ultraconservative algorithms where each algorithm in the family updates its prototypes by finding a feasible solution for a set of linear constraints that depend on the instantaneous similarityscores. We then discuss a specific online algorithm that seeks a set of prototypes which have a small norm. The resulting algorithm, which we term MIRA (for Margin Infused Relaxed Algorithm) is ultraconservative as well. We derive mistake bounds for all the algorithms and provide further analysis of MIRA using a generalized notion of the margin for multiclass problems.
Controlling the Sensitivity of Support Vector Machines
 Proceedings of the International Joint Conference on AI
, 1999
"... For many applications it is important to accurately distinguish false negative results from false positives. This is particularly important for medical diagnosis where the correct balance between sensitivity and specificity plays an important role in evaluating the performance of a classifier. In th ..."
Abstract

Cited by 102 (4 self)
 Add to MetaCart
For many applications it is important to accurately distinguish false negative results from false positives. This is particularly important for medical diagnosis where the correct balance between sensitivity and specificity plays an important role in evaluating the performance of a classifier. In this paper we discuss two schemes for adjusting the sensitivity and specificity of Support Vector Machines and the description of their performance using receiver operating characteristic (ROC) curves. We then illustrate their use on reallife medical diagnostic tasks. 1 Introduction. Since their introduction by Vapnik and coworkers [ Vapnik, 1995; Cortes and Vapnik, 1995 ] , Support Vector Machines (SVMs) have been successfully applied to a number of real world problems such as handwritten character and digit recognition [ Scholkopf, 1997; Cortes, 1995; LeCun et al., 1995; Vapnik, 1995 ] , face detection [ Osuna et al., 1997 ] and speaker identification [ Schmidt, 1996 ] . They exhibit a r...
Efficient SVM Regression Training with SMO
, 2001
"... The sequential minimal optimization algorithm (SMO) has been shown to be an effective method for training support vector machines ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
The sequential minimal optimization algorithm (SMO) has been shown to be an effective method for training support vector machines
From Margin To Sparsity
 IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13
, 2001
"... We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PACstyle generalisation error bound for the classifier learned by the dual perceptron learning algorithm. The bound value ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
(Show Context)
We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PACstyle generalisation error bound for the classifier learned by the dual perceptron learning algorithm. The bound value crucially depends on the margin a support vector machine would achieve on the same data set using the same kernel. Ironically, the bound yields better guarantees than are currently available for the support vector solution itself.
Learning classifiers from distributed, semantically heterogeneous, autonomous data sources
, 2004
"... Recent advances in computing, communications, and digital storage technologies, together with development of high throughput data acquisition technologies have made it possible to gather and store large volumes of data in digital form. These developments have resulted in unprecedented opportunities ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Recent advances in computing, communications, and digital storage technologies, together with development of high throughput data acquisition technologies have made it possible to gather and store large volumes of data in digital form. These developments have resulted in unprecedented opportunities for largescale datadriven knowledge acquisition with the potential for fundamental gains in scientific understanding (e.g., characterization of macromolecular structurefunction relationships in biology) in many datarich domains. In such applications,
the data sources of interest are typically physically distributed, semantically heterogeneous and autonomously owned and operated, which makes it impossible to use traditional machine learning algorithms for knowledge acquisition.
However, we observe that most of the learning algorithms use only certain statistics computed from data in the process of generating the hypothesis that they output and we use this observation to design a general strategy for transforming traditional algorithms for learning from data into algorithms for learning from distributed data. The resulting algorithms are provably exact in that the classifiers produced by them are identical to those obtained by the corresponding algorithms in the centralized setting (i.e., when all of the data is available in a central location) and they compare favorably to their centralized counterparts in terms of time and communication complexity.
To deal with the semantical heterogeneity problem, we introduce ontologyextended data sources and define a user perspective consisting of an ontology and a set of interoperation constraints between data source ontologies and the user ontology. We show how these constraints can be used to define mappings and conversion functions needed to answer statistical queries from semantically heterogeneous data viewed from a certain user perspective. That is further used to extend our approach for learning from distributed data into a theoretically sound approach to learning from semantically heterogeneous data.
The work described above contributed to the design and implementation of AirlDM, a collection of data source independent machine learning algorithms through the means of sufficient statistics and data source wrappers, and to the design of INDUS, a federated, querycentric system for knowledge acquisition from distributed, semantically heterogeneous, autonomous data sources.
Feature Selection using PSOSVM
"... Abstract—The feature selection process can be considered a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, and results in an acceptable classification accuracy. Feature selection is of great importa ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract—The feature selection process can be considered a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, and results in an acceptable classification accuracy. Feature selection is of great importance in pattern classification, medical data processing, machine learning, and data mining applications. Therefore, a good feature selection method based on the number of features investigated for sample classification is needed in order to speed up the processing rate, predictive accuracy, and to avoid incomprehensibility. In this paper, particle swarm optimization (PSO) is used to implement a feature selection, and support vector machines (SVMs) with the oneversusrest method serve as a fitness function of PSO for the classification problem. The proposed method is applied to five classification problems from the literature. Experimental results show that our method simplifies features effectively and obtains a higher classification accuracy compared to the other feature selection methods.
On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel
 Machines, Proc. of ESANN 2003, 11 th European Symposium on Artificial Neural Networks
"... Abstract: The paper presents the equality of a kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and sequential minimal optimization (SMO) learning algorithm (based on an analytic quadratic programming step) in designing the support vector machines (SVMs) having posit ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Abstract: The paper presents the equality of a kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and sequential minimal optimization (SMO) learning algorithm (based on an analytic quadratic programming step) in designing the support vector machines (SVMs) having positive definite kernels. The conditions of the equality of two methods are established. The equality is valid for both the nonlinear classification and the nonlinear regression tasks, and it sheds a new light to these seemingly different learning approaches. The paper also introduces other learning techniques related to the two mentioned approaches, such as the nonnegative conjugate gradient, classic GaussSeidel (GS) coordinate ascent procedure and its derivative known as the successive overrelaxation (SOR) algorithm as a viable and usually faster training algorithms for performing nonlinear classification and regression tasks. The convergence theorem for these related iterative algorithms is proven. 1.
Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance
 PERFORMANCE, SUPPORT VECTOR MACHINES: THEORY AND APPLICATIONS, SPRINGERVERLAG,.STUDIES IN FUZZINESS AND SOFT COMPUTING
, 2005
"... The chapter introduces the latest developments and results of Iterative Single Data Algorithm (ISDA) for solving largescale support vector machines (SVMs) problems. First, the equality of a Kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and the Sequential Minimal ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
The chapter introduces the latest developments and results of Iterative Single Data Algorithm (ISDA) for solving largescale support vector machines (SVMs) problems. First, the equality of a Kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and the Sequential Minimal Optimization (SMO) learning algorithm (based on an analytic quadratic programming step for a model without bias term b) in designing SVMs with positive definite kernels is shown for both the nonlinear classification and the nonlinear regression tasks. The chapter also introduces the classic GaussSeidel (GS) procedure and its derivative known as the successive overrelaxation (SOR) algorithm as viable (and usually faster) training algorithms. The convergence theorem for these related iterative algorithms is proven. The second part of the chapter presents the effects and the methods of incorporating explicit bias term b into the ISDA. The algorithms shown here implement the single training data based iteration routine (a.k.a. perpattern learning). This makes the proposed ISDAs remarkably quick. The final solution in a dual domain is not an approximate one, but it is the optimal set of dual variables which would have been obtained by using any of existing and proven QP problem solvers if they only could deal with huge data sets.
Simple Learning Algorithms for Training Support Vector Machines
, 1998
"... Support Vector Machines (SVMs) have proven to be highly effective for learning many real world datasets but have failed to establish themselves as common machine learning tools. This is partly due to the fact that they are not easy to implement, and their standard implementation requires the use of ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Support Vector Machines (SVMs) have proven to be highly effective for learning many real world datasets but have failed to establish themselves as common machine learning tools. This is partly due to the fact that they are not easy to implement, and their standard implementation requires the use of optimization packages. In this paper we present simple iterative algorithms for training support vector machines which are easy to implement and guaranteed to converge to the optimal solution. Furthermore we provide a technique for automatically finding the kernel parameter and best learning rate. Extensive experiments with real datasets are provided showing that these algorithms compare well with standard implementations of SVMs in terms of generalisation accuracy and computational cost, while being significantly simpler to implement. 1 Introduction Since their introduction by Vapnik and coworkers [38, 7], Support Vector Machines (SVMs) have been successfully applied to a number of real ...