#### DMCA

## Training Invariant Support Vector Machines using Selective Sampling

Citations: | 24 - 0 self |

### Citations

6495 | LIBSVM: A Library for Support Vector Machines - Chang, Lin |

3703 | Support-vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...algorithm amenable to selective sampling (Bordes et al., 2005). Consider a binary classification problem with training patterns x1 ...xn and associated classes y1 ...yn ∈ {+1, −1}. A soft margin SVM (=-=Cortes and Vapnik, 1995-=-) classifies a pattern x according to the sign of a decision function f(x) = � αi 〈x,xi〉 + b (1) i where the notation 〈x,x ′ 〉 represents the dot-product of feature vectors associated with the pattern... |

2830 | Learning with Kernels
- Schölkopf, Smola
- 2002
(Show Context)
Citation Context ...ery expensive. To address those limitations, there has been a lot of clever studies on solving quadratic optimization problems (Chang and Lin, 2001; Joachims, 1999), on online learning (Bottou, 1998; =-=Kivinen et al., 2002-=-; Crammer et al., 2004), on sparse solutions (Vincent and Bengio, 2002), and on active learning (Cohn et al., 1995; Campbell et al., 2000). The notion of computational complexity discussed in this pap... |

2801 | Matrix computations - Golub, Loan - 1996 |

1861 | Making large-scale svm learning practical
- Joachims
- 1999
(Show Context)
Citation Context ...g and recognition phases. Third, labeling the training examples becomes very expensive. To address those limitations, there has been a lot of clever studies on solving quadratic optimization problems =-=[7, 18]-=-, on online learning [4, 19, 11], on sparse solutions [39], and on active learning [9, 6]. The notion of computational complexity discussed in this paper is tied to the empirical performance of algori... |

1532 | Gradient-based learning applied to document recognition - LeCun, Bottou, et al. - 1998 |

1510 |
Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods: support vector learning
- Platt
- 1999
(Show Context)
Citation Context ...ge scale invariant problem. 4s2 Online algorithm with selective sampling This section first discusses the geometry of the quadratic optimization problem and its suitability to algorithms, such as SMO =-=[31]-=-, that iterate feasible direction searches. Then we describe how to organize feasible direction searches into an online learning algorithm amenable to selective sampling [3]. Consider a binary classif... |

679 | Active learning with statistical models.
- Cohn, Ghahramani, et al.
- 1996
(Show Context)
Citation Context ... address those limitations, there has been a lot of clever studies on solving quadratic optimization problems [7, 18], on online learning [4, 19, 11], on sparse solutions [39], and on active learning =-=[9, 6]-=-. The notion of computational complexity discussed in this paper is tied to the empirical performance of algorithms. Three common strategies can be distinguished to reduce this practical complexity (o... |

521 | Large margin classification using the perceptron algorithm
- Freund, Schapire
- 1999
(Show Context)
Citation Context ...ong direction xt, that is wt = wt−1 + ytxt. Compared to maximum margin classifiers such as SVM, the Perceptron runs much faster but does not deliver as good a generalization performance. Many authors =-=[12, 13, 15, 23, 11]-=- have modified the Perceptron algorithm to ensure a margin. Older variants of the Perceptron, such as minover and adatron in [27], are also very close to SVMs. 1.4 Active learning Active learning addr... |

278 | Less is more: Active learning with support vector machines,”
- Schohn, Cohn
- 2000
(Show Context)
Citation Context ...labels. Therefore it must carefully choose which examples deserve the labeling expense. Even when all labels are available beforehand, active learning is useful because it leads to sparser solutions (=-=Schohn and Cohn, 2000-=-; Bordes et al., 2005). Moreover, the criteria for asking or not a label may be cheaper to compute than trying a labeled point. 1.5 Outline of the paper Section 2 briefly presents the SVMs and describ... |

272 | Efficient pattern recognition using a new transformation distance - Simard, LeCun, et al. - 1994 |

210 |
A Time-Delay Neural Network Architecture for Isolated Word Recognition",
- Lang, Waibel, et al.
- 1990
(Show Context)
Citation Context ...quality of a pattern recognition system can be improved by taking into account invariance. Very different ways to handle invariance in machine learning algorithms have been proposed (Fukushima, 1988; =-=Lang and Hinton, 1988-=-; Simard et al., 2000; Leen, 1995; Schölkopf et al., 1996). In the case of kernel machines, three general approaches have been proposed. The first approach consists of learning orbits instead of point... |

201 | Best practices for convolutional neural networks applied to visual document analysis.
- Simard, Steinkraus, et al.
- 2003
(Show Context)
Citation Context ...racy matches the results obtained using virtual support vectors (Schölkopf and Smola, 2001) on the original MNIST test set. Slightly better performances have been reported using convolution networks (=-=Simard et al., 2003-=-), or using a deskewing algorithm to make the test set easier (Schölkopf and Smola, 2001). 17s1R/1P 2R/1P 3R/1P 4R/1P 5R/1P Max size 24531 21903 21436 20588 20029 Removed pts 1861 1258 934 777 537 Pro... |

161 | Transformation invariance in pattern recognition—Tangent distance and tangent propagation. - Simard, LeCun, et al. - 1998 |

157 | Query learning with large margin classifiers.
- Campbell, Cristianini, et al.
- 2000
(Show Context)
Citation Context ...d Lin, 2001; Joachims, 1999), on online learning (Bottou, 1998; Kivinen et al., 2002; Crammer et al., 2004), on sparse solutions (Vincent and Bengio, 2002), and on active learning (Cohn et al., 1995; =-=Campbell et al., 2000-=-). The notion of computational complexity discussed in this paper is tied to the empirical performance of algorithms. Three common strategies can be distinguished to reduce this practical complexity (... |

157 |
On convergence proofs on perceptrons.
- Novikoff
- 1962
(Show Context)
Citation Context ...overlapping frameworks have been used to study online learning algorithms, by leveraging the mathematics of stochastic approximations (Bottou, 1998), or by refining the mathematics of the Perceptron (=-=Novikoff, 1962-=-). The Perceptron seems a natural starting point for online SVM. Algorithms derived from the Perceptron share common characteristics. Each iteration consists of two steps. First one decides if the new... |

153 | L.: Fast kernel classifiers with online and active learning
- Bordes, Ertekin, et al.
- 2005
(Show Context)
Citation Context ... first order gradient methods. We are convinced that other approximate techniques can exploit the stochastic regularities of learning problems. For instance the LASVM method (Bordes and Bottou, 2005; =-=Bordes et al., 2005-=-) seems considerably more efficient than the first order online SVMs discussed in (Kivinen et al., 2002). 1.3 Online learning Online learning algorithms are usually associated with problems where the ... |

119 | Support vector machines: hype or hallelujah?” - Bennett, Campbell - 2000 |

117 |
Neocognitron: a hierarchical neural network capable of visual pattern recognition
- Fukushima
- 1988
(Show Context)
Citation Context ...dmitted that the quality of a pattern recognition system can be improved by taking into account invariance. Very different ways to handle invariance in machine learning algorithms have been proposed (=-=Fukushima, 1988-=-; Lang and Hinton, 1988; Simard et al., 2000; Leen, 1995; Schölkopf et al., 1996). In the case of kernel machines, three general approaches have been proposed. The first approach consists of learning ... |

106 |
The kerneladatron algorithm: a fast and simple learning procedure for support vector machines,”
- Frieb, Cristianini, et al.
- 1998
(Show Context)
Citation Context ... = wt−1 + ytxt. Compared to maximum margin classifiers such as SVM, the Perceptron runs much faster but does not deliver as good a generalization performance. Many authors (Freund and Schapire, 1998; =-=Frieß et al., 1998-=-; Gentile, 2001; Li and Long, 2002; Crammer et al., 2004) have modified the Perceptron algorithm to ensure a margin. 3sOlder variants of the Perceptron, such as minover and adatron in (Nadal, 1993), a... |

103 | A new approximate maximal margin classification algorithm. - Gentile - 2001 |

87 | Incorporating invariances in support vector learning machines,
- SchWolkopf, Burges, et al.
- 1996
(Show Context)
Citation Context ...by taking into account invariance. Very different ways to handle invariance in machine learning algorithms have been proposed (Fukushima, 1988; Lang and Hinton, 1988; Simard et al., 2000; Leen, 1995; =-=Schölkopf et al., 1996-=-). In the case of kernel machines, three general approaches have been proposed. The first approach consists of learning orbits instead of points. It requires costly semi-definite programming algorithm... |

85 | The relaxed online maximum margin algorithm.
- Li, Long
- 2002
(Show Context)
Citation Context ... margin classifiers such as SVM, the Perceptron runs much faster but does not deliver as good a generalization performance. Many authors (Freund and Schapire, 1998; Frieß et al., 1998; Gentile, 2001; =-=Li and Long, 2002-=-; Crammer et al., 2004) have modified the Perceptron algorithm to ensure a margin. 3sOlder variants of the Perceptron, such as minover and adatron in (Nadal, 1993), are also very close to SVMs. 1.4 Ac... |

84 | Kernel matching pursuit
- Vincent, Bengio
(Show Context)
Citation Context ...of clever studies on solving quadratic optimization problems (Chang and Lin, 2001; Joachims, 1999), on online learning (Bottou, 1998; Kivinen et al., 2002; Crammer et al., 2004), on sparse solutions (=-=Vincent and Bengio, 2002-=-), and on active learning (Cohn et al., 1995; Campbell et al., 2000). The notion of computational complexity discussed in this paper is tied to the empirical performance of algorithms. Three common st... |

82 | Online algorithms and stochastic approximations - Bottou - 1998 |

71 | Y.: Large scale online learning
- Bottou, LeCun
- 2004
(Show Context)
Citation Context ...ies are very useful for large scale learning. A well designed online algorithm needs less computation to reach the same test set accuracy as the corresponding batch algorithm (Murata and Amari, 1999; =-=Bottou and LeCun, 2004-=-). Two overlapping frameworks have been used to study online learning algorithms, by leveraging the mathematics of stochastic approximations (Bottou, 1998), or by refining the mathematics of the Perce... |

66 |
Methods of Feasible Directions
- Zoutendijk
- 1960
(Show Context)
Citation Context ... for the feasible 1 Note that αi is positive when yi = +1 and negative when yi = − 1. 5 (2)spoint αt+1 that maximizes the cost function. The optimum is reached when no further improvement is possible =-=[41]-=-. The quadratic cost function restricted to the half-line search might reach its maximum inside or outside the polytope (see figure 2). The new feasible point αt+1 is easily derived from the different... |

53 | Online classification on a budget
- Crammer, Kandola, et al.
- 2003
(Show Context)
Citation Context ...ess those limitations, there has been a lot of clever studies on solving quadratic optimization problems (Chang and Lin, 2001; Joachims, 1999), on online learning (Bottou, 1998; Kivinen et al., 2002; =-=Crammer et al., 2004-=-), on sparse solutions (Vincent and Bengio, 2002), and on active learning (Cohn et al., 1995; Campbell et al., 2000). The notion of computational complexity discussed in this paper is tied to the empi... |

45 |
Invariant pattern recognition - a review
- Wood
- 1996
(Show Context)
Citation Context ...ers do not. In machine learning, the a priori knowledge of such invariance properties can be used to improve the pattern recognition accuracy. Many approaches have been proposed (Simard et al., 1993; =-=Wood, 1996-=-; Schölkopf et al., 1996; Leen, 1995). We first propose an illustration of the influence of invariance. Then we describe our selective sampling approach to invariance, and discuss practical implementa... |

25 | Invariant pattern recognition by semidefinite programming machines. In
- Graepel, Herbrich
- 2004
(Show Context)
Citation Context ...In the case of kernel machines, three general approaches have been proposed. The first approach consists of learning orbits instead of points. It requires costly semi-definite programming algorithms (=-=Graepel and Herbrich, 2004-=-). The second approach involves specialized kernels. This turns out to be equivalent to mapping the patterns into a space of invariant features (Chapelle and Schölkopf, 2002). Such features are often ... |

23 | From data distributions to regularization in invariant learning
- Leen
- 1995
(Show Context)
Citation Context ...be improved by taking into account invariance. Very different ways to handle invariance in machine learning algorithms have been proposed (Fukushima, 1988; Lang and Hinton, 1988; Simard et al., 2000; =-=Leen, 1995-=-; Schölkopf et al., 1996). In the case of kernel machines, three general approaches have been proposed. The first approach consists of learning orbits instead of points. It requires costly semi-defini... |

17 | The Huller: a simple and efficient online SVM,
- Bordes, Bottou
- 2005
(Show Context)
Citation Context ... is only one option left, first order gradient methods. We are convinced that other approximate techniques can exploit the stochastic regularities of learning problems. For instance the LASVM method (=-=Bordes and Bottou, 2005-=-; Bordes et al., 2005) seems considerably more efficient than the first order online SVMs discussed in (Kivinen et al., 2002). 1.3 Online learning Online learning algorithms are usually associated wit... |

15 | Active learning with statistical models. - Jordan - 1996 |

14 | S.: Statistical analysis of learning dynamics
- Murata, Amari
- 1999
(Show Context)
Citation Context ...ir computational properties are very useful for large scale learning. A well designed online algorithm needs less computation to reach the same test set accuracy as the corresponding batch algorithm (=-=Murata and Amari, 1999-=-; Bottou and LeCun, 2004). Two overlapping frameworks have been used to study online learning algorithms, by leveraging the mathematics of stochastic approximations (Bottou, 1998), or by refining the ... |

14 |
Sparseness of support vector machines—some asymptotically sharp bounds
- Steinwart
- 2004
(Show Context)
Citation Context ...i ⎩ 0 ≤ yiαi ≤ C A pattern xi is called “support vector” when the corresponding coefficient αi is non zero. The number s of support vectors asymptotically grows linearly with the number n of examples =-=[38]-=-. 2.1 Feasible direction algorithms The geometry of the SVM QP problem (2) is summarized in figure 1. The box constraints 0 ≤ yiαi ≤ C restrict the solutions to a n–dimensional hypercube. The equality... |

13 |
Incorporating invariances in nonlinear SVMs
- Chapelle, Scholkopf
- 2002
(Show Context)
Citation Context ...requires costly semi-definite programming algorithms [17]. The second approach involves specialized kernels. This turns out to be equivalent to mapping the patterns into a space of invariant features =-=[8]-=-. Such features are often difficult to construct. The third approach is the most general. It consists of artificially enlarging the training set by new examples obtained by deforming the available inp... |

7 | Invariances in classification : an efficient SVM Implementation
- Loosli, Canu, et al.
- 2005
(Show Context)
Citation Context ...fficient to describe the invariant decision boundaries. We therefore hope to solve problems with multiple invariance with milder size and complexity constraints. 9sOur first approach was inspired by (=-=Loosli et al., 2005-=-). Each iteration randomly picks the next training example, generates a number of transformed examples describing the orbit of the original example, selects the best transformed example (see section 2... |

4 | Une boîte à outils rapide et simple pour les SVM - Loosli, Canu, et al. |

2 |
Réseaux de neurones: de la physique à la psychologie
- Nadal
- 1993
(Show Context)
Citation Context ...eß et al., 1998; Gentile, 2001; Li and Long, 2002; Crammer et al., 2004) have modified the Perceptron algorithm to ensure a margin. 3sOlder variants of the Perceptron, such as minover and adatron in (=-=Nadal, 1993-=-), are also very close to SVMs. 1.4 Active learning Active learning addresses problems where obtaining labels is expensive (Cohn et al., 1995). The learner has access to a vast pool of unlabeled examp... |

2 |
An improved training algorithm fo support vector machines
- Osuna, Freund, et al.
- 1997
(Show Context)
Citation Context ...exity (or observed training time). The first strategy consists in working on subsets of the training data, solving several smaller problems instead of a large one, as in the SVM decomposition method (=-=Osuna et al., 1997-=-). The second strategy consists in parallelizing the learning algorithm. The third strategy tries to design less complex algorithms that give an approximate solution with equivalent or superior perfor... |

1 | Introduction to convex programming, interior point methods, and semidefinite programming - Nemirovski - 2005 |