Results 1  10
of
12
Noise tolerant variants of the perceptron algorithm
 Journal of Machine Learning Research
, 2005
"... A large number of variants of the Perceptron algorithm have been proposed and partially evaluated in recent work. One type of algorithm aims for noise tolerance by replacing the last hypothesis of the perceptron with another hypothesis or a vote among hypotheses. Another type simply adds a margin te ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
(Show Context)
A large number of variants of the Perceptron algorithm have been proposed and partially evaluated in recent work. One type of algorithm aims for noise tolerance by replacing the last hypothesis of the perceptron with another hypothesis or a vote among hypotheses. Another type simply adds a margin term to the perceptron in order to increase robustness and accuracy, as done in support vector machines. A third type borrows further from support vector machines and constrains the update function of the perceptron in ways that mimic softmargin techniques. The performance of these algorithms, and the potential for combining different techniques, has not been studied in depth. This paper provides such an experimental study and reveals some interesting facts about the algorithms. In particular the perceptron with margin is an effective method for tolerating noise and stabilizing the algorithm. This is surprising since the margin in itself is not designed or used for noise tolerance, and there are no known guarantees for such performance. In most cases, similar performance is obtained by the votedperceptron which has the advantage that it does not require parameter selection. Techniques using soft margin ideas are runtime intensive and do not give additional performance benefits. The results also highlight the difficulty with automatic parameter selection which is required with some of these variants.
A kernel method for the optimization of the margin distribution
 In International Conference on Artificial Neural Networks (ICANN
, 2008
"... Abstract. Recent results in theoretical machine learning seem to suggest that nice properties of the margin distribution over a training set turns out in a good performance of a classifier. The same principle has been already used in SVM and other kernel based methods as the associated optimization ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Recent results in theoretical machine learning seem to suggest that nice properties of the margin distribution over a training set turns out in a good performance of a classifier. The same principle has been already used in SVM and other kernel based methods as the associated optimization problems try to maximize the minimum of these margins. In this paper, we propose a kernel based method for the direct optimization of the margin distribution (KMOMD). The method is motivated and analyzed from a game theoretical perspective. A quite efficient optimization algorithm is then proposed. Experimental results over a standard benchmark of 13 datasets have clearly shown stateoftheart performances.
Sharp Generalization Error Bounds for Randomlyprojected Classifiers
 30th International Conference on Machine Learning (ICML 2013), JMLR W&CP
, 2013
"... We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomlyprojected data. We make no restrictive assumptions (such as sparsity or separability) on the data: Instead we use the fact that, in a classification setting, the questi ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomlyprojected data. We make no restrictive assumptions (such as sparsity or separability) on the data: Instead we use the fact that, in a classification setting, the question of interest is really ‘what is the effect of random projection on the predicted class labels? ’ and we therefore derive the exact probability of ‘label flipping ’ under Gaussian random projection in order to quantify this effect precisely in our bounds. 1.
A Risk Minimization Principle for a Class of Parzen Estimators
"... This paper 1 explores the use of a Maximal Average Margin (MAM) optimality principle for the design of learning algorithms. It is shown that the application of this risk minimization principle results in a class of (computationally) simple learning machines similar to the classical Parzen window cla ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
This paper 1 explores the use of a Maximal Average Margin (MAM) optimality principle for the design of learning algorithms. It is shown that the application of this risk minimization principle results in a class of (computationally) simple learning machines similar to the classical Parzen window classifier. A direct relation with the Rademacher complexities is established, as such facilitating analysis and providing a notion of certainty of prediction. This analysis is related to Support Vector Machines by means of a margin transformation. The power of the MAM principle is illustrated further by application to ordinal regression tasks, resulting in an O(n) algorithm able to process large datasets in reasonable time. 1
Reinforcement Learning Without Rewards
, 2010
"... Machine learning can be broadly defined as the study and design of algorithms thatimprovewithexperience. Reinforcement learning isavarietyofmachinelearning that makes minimal assumptions about the information available for learning, and, in a sense, defines the problem of learning in the broadest po ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Machine learning can be broadly defined as the study and design of algorithms thatimprovewithexperience. Reinforcement learning isavarietyofmachinelearning that makes minimal assumptions about the information available for learning, and, in a sense, defines the problem of learning in the broadest possible terms. Reinforcement learning algorithms are usually applied to “interactive” problems, such as learning to drive a car, operate a robotic arm, or play a game. In reinforcement learning, an autonomous agent must learn how to behave in an unknown, uncertain, and possibly hostile environment, usingonly thesensory feedbackthat it receives from theenvironment. As the agent moves from one state of the environment to another, it receives only a reward signal — there is no human “in the loop ” to tell the algorithm exactly what to do. The goal in reinforcement learning is to learn an optimal behavior that maximizes the total reward that the agent collects. Despite its generality, the reinforcement learning framework does make one strong assumption: that the reward signal can always be directly and unambiguously observed. In other words, the feedback a reinforcement learning algorithm receives is
Kernel Methods and Their Application to Structured Data
, 2009
"... Supervised Machine learning is concerned with the study of algorithms that take examples and their corresponding labels, and learn a general classification function that can predict the label of future examples. For example, an algorithm may take as input a set of molecules, each labeled “toxic” or ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Supervised Machine learning is concerned with the study of algorithms that take examples and their corresponding labels, and learn a general classification function that can predict the label of future examples. For example, an algorithm may take as input a set of molecules, each labeled “toxic” or “nontoxic” and try to predict the toxicity of new molecules based on the function learned from the input. In the astronomy domain, one might try to predict the type of a star given a series of measurements of the star’s brightness, based on a set of known stars and measurements of their brightness. The thesis investigates three aspects of machine learning algorithms that use linear classification functions that work implicitly in feature spaces by using similarity functions known as kernels. The first aspect is robustness to noise, that is learning when some of the labels in the known examples are not reliable. An extensive experimental evaluation reveals a surprising result, that the Perceptron Algorithm with margin is an excellent algorithm in such contexts, and it is competitive or better than more sophisticated alternatives. The second aspect is producing estimates of the confidence of predictions from such classifiers, especially
Large Margin Distribution Machine
"... Support vector machine (SVM) has been one of the most popular learning algorithms, with the central idea of maximizing the minimum margin, i.e., the smallest distance from the instances to the classification boundary. Recent theoretical results, however, disclosed that maximizing the minimum marg ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Support vector machine (SVM) has been one of the most popular learning algorithms, with the central idea of maximizing the minimum margin, i.e., the smallest distance from the instances to the classification boundary. Recent theoretical results, however, disclosed that maximizing the minimum margin does not necessarily lead to better generalization performances, and instead, the margin distribution has been proven to be more crucial. In this paper, we propose the Large margin Distribution Machine (LDM), which tries to achieve a better generalization performance by optimizing the margin distribution. We characterize the margin distribution by the first and secondorder statistics, i.e., the margin mean and variance. The LDM is a general learning approach which can be used in any place where SVM can be applied, and its superiority is verified both theoretically and empirically in this paper.
Laplacian Margin Distribution Boosting for Learning from Sparsely Labeled Data
"... Abstract Boosting algorithms attract much attention in computer vision and image processing because of their strong performance in a variety of applications. Recent progress on theory of boosting algorithms suggests a close link between good generalization and the margin distrubtion of the classifi ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Boosting algorithms attract much attention in computer vision and image processing because of their strong performance in a variety of applications. Recent progress on theory of boosting algorithms suggests a close link between good generalization and the margin distrubtion of the classifier w.r.t. a dataset. In this paper, we propose a novel datadependent margin distribution learning criterion for boosting, termed Laplacian MDBoost, which utilizes the intrinsic geometric structure of dataset. One key aspect of our method is that it can seamlessly incorporate unlabeled data by including a graph Laplacian regularizer. We derive a dual formulation of the learning problem that can be efficiently solved by column generation. Experiments on various datasets validate the effectiveness of the new graph Laplacian based learning criterion on both supervised and unsupervised learning settings. We also show that the performance of our algorithm outperforms the stateoftheart semisupervised learning algorithms on a variety of inductive inference tasks, including real world video segmentation.
SPEED UP SVMRFE PROCEDURE USING MARGIN DISTRIBUTION
"... In this paper, a new method is introduced to speed up the recursive feature ranking procedure by using the margin distribution of a trained SVM. The method, MRFE, continuously eliminates features without retraining the SVM as long as the margin distribution of the SVM does not change significantly. ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, a new method is introduced to speed up the recursive feature ranking procedure by using the margin distribution of a trained SVM. The method, MRFE, continuously eliminates features without retraining the SVM as long as the margin distribution of the SVM does not change significantly. Synthetic datasets and two benchmark microarray datasets were tested on MRFE. Comparison with original SVMRFE shows that our method speeds up the feature ranking procedure considerably with little or no performance degradation. Comparison of MRFE to a similar speed up technique, ERFE, provides similar classification performance, but with reduced complexity. 1.