Results 1  10
of
16
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 766 (29 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
Linear programming boosting via column generation
 Machine Learning
, 2002
"... 1 Introduction Recent papers [20] have shown that boosting, arcing, and related ensemble methods (hereafter summarized asboosting) can be viewed as margin maximization in function space. By changing the cost function, different ..."
Abstract

Cited by 157 (3 self)
 Add to MetaCart
1 Introduction Recent papers [20] have shown that boosting, arcing, and related ensemble methods (hereafter summarized asboosting) can be viewed as margin maximization in function space. By changing the cost function, different
Kernel Logistic Regression and the Import Vector Machine
 Journal of Computational and Graphical Statistics
, 2001
"... The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multiclass classification is still an ongoing research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on ker ..."
Abstract

Cited by 117 (4 self)
 Add to MetaCart
(Show Context)
The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multiclass classification is still an ongoing research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multiclass case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the "support points" of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large. 1
SV Estimation of a Distribution's Support
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified 0 < 1. We propose an algorithm which appro ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified 0 < 1. We propose an algorithm which approaches this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data.
A Column Generation Algorithm For Boosting
, 2000
"... We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation simplex method. We prove that minimizing the soft margin error function (equivalent to solving an LP) directly optimizes a generalization error bound. LPBoost can be ..."
Abstract

Cited by 34 (7 self)
 Add to MetaCart
(Show Context)
We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation simplex method. We prove that minimizing the soft margin error function (equivalent to solving an LP) directly optimizes a generalization error bound. LPBoost can be used to solve any boosting LP by iteratively optimizing the dual classification costs in a restricted LP and dynamically generating weak learners to make new LP columns. Unlike gradient boosting algorithms, LPBoost converges finitely to a global solution using well defined stopping criteria. Computationally, LPBoost finds very sparse solutions as good as or better than those found by ADABoost using comparable computation.
Adaptive Margin Support Vector Machines for Classification
 ADVANCES IN LARGE MARGIN CLASSIFIERS
, 2000
"... In this paper we propose a new learning algorithm for classification learning based on the Support Vector Machine (SVM) approach. Existing approaches for constructing SVMs [12] are based on minimization of a regularized margin loss where the margin is treated equivalently for each training pattern. ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
In this paper we propose a new learning algorithm for classification learning based on the Support Vector Machine (SVM) approach. Existing approaches for constructing SVMs [12] are based on minimization of a regularized margin loss where the margin is treated equivalently for each training pattern. We propose a reformulation of the minimization problem such that adaptive margins for each training pattern are utilized, which we call the Adaptive Margin (AM) SVM. We give bounds on the generalization error of AMSVMs which justify their robustness against outliers, and show experimentally that the generalization error of AMSVMs is comparable to classical SVMs on benchmark datasets from the UCI repository.
Tree Decomposition for LargeScale SVM Problems: Experimental and Theoretical Results
, 2009
"... To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a data space and trains SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for largescale SVM training ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a data space and trains SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for largescale SVM training. First, it can classify some data points by its own means, thereby reducing the cost of SVM training applied to the remaining data points. Second, it is efficient for seeking the parameter values that maximize the validation accuracy, which helps maintain good test accuracy. Third, we can provide a generalization error bound for the classifier derived by the tree decomposition method. For experiment data sets whose size can be handled by current nonlinear, or kernelbased SVM training techniques, the proposed method can speed up the training by a factor of thousands, and still achieve comparable test accuracy.
Some New Bounds on the Generalization Error of Combined Classifiers.
, 2001
"... In this paper we develop the method of bounding the generalization error of a classifier in terms of its margin distribution which was introduced in the recent papers of Bartlett and Schapire, Freund, Bartlett and Lee. The theory of Gaussian and empirical processes allow us to prove the margin type ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
In this paper we develop the method of bounding the generalization error of a classifier in terms of its margin distribution which was introduced in the recent papers of Bartlett and Schapire, Freund, Bartlett and Lee. The theory of Gaussian and empirical processes allow us to prove the margin type inequalities for the most general functional classes, the complexity of the class being measured via the so called Gaussian complexity functions. As a simple application of our results, we obtain the bounds of Schapire, Freund, Bartlett and Lee for the generalization error of boosting. We also substantially improve the results of Bartlett on bounding the generalization error of neural networks in terms of l 1 norms of the weights of neurons. Furthermore, under additional assumptions on the complexity of the class of hypotheses we provide some tighter bounds, which in the case of boosting improve the results of Schapire, Freund, Bartlett and Lee.
Large Margin Trees for Induction and Transduction
, 1999
"... The problem of controlling the capacity of decision trees is considered for the case where the decision nodes implement linear threshold functions. In addition to the standard early stopping and pruning procedures, we implement a strategy based on the margins of the decision boundaries at the ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The problem of controlling the capacity of decision trees is considered for the case where the decision nodes implement linear threshold functions. In addition to the standard early stopping and pruning procedures, we implement a strategy based on the margins of the decision boundaries at the nodes. The approach is motivated by bounds on generalization error obtained in terms of the margins of the individual classifiers. Experimental results are given which demonstrate that considerable advantage can be derived from using the margin information. The same strategy is applied to the problem of transduction, where the positions of the testing points are revealed to the training algorithm. This information is used to generate an alternative training criterion motivated by transductive theory. In the transductive case, the results are not as encouraging, suggesting that little, if any, consistent advantage is culled from using the unlabelled data in the proposed fashion...
Comment
"... We congratulate the authors for a well written and thoughtful survey of some of the literature in this area. They are mainly concerned with the geometry and the computational learning aspects of the support vector machine (SVM). We will therefore complement their review by discussing from the statis ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We congratulate the authors for a well written and thoughtful survey of some of the literature in this area. They are mainly concerned with the geometry and the computational learning aspects of the support vector machine (SVM). We will therefore complement their review by discussing from the statistical function estimation perspective. In particular, we will elaborate on the following points: • Kernel regularization is essentially a generalized ridge penalty in a certain feature space. • In practice, the effective dimension of the data kernel matrix is not always equal to n, even when the implicit dimension of the feature space is infinite; hence, the training data are not always perfectly separable. • Appropriate regularization plays an important role in the success of the SVM. • The SVM is not fundamentally different from many statistical tools that our statisticians are familiar with, for example, penalized logistic regression. We acknowledge that many of the comments are based