Results 1  10
of
262
Training a support vector machine in the primal
 Neural Computation
, 2007
"... Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and nonlinear SVMs, and that there is no reason for ignoring this possibilty. On the cont ..."
Abstract

Cited by 154 (5 self)
 Add to MetaCart
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and nonlinear SVMs, and that there is no reason for ignoring this possibilty. On the contrary, from the primal point of view new families of algorithms for large scale SVM training can be investigated.
Variable Kernel Density Estimation
 Annals of Statistics
, 1992
"... In this paper, we propose a method for robust kernel density estimation. We interpret a KDE with Gaussian kernel as the inner product between a mapped test point and the centroid of mapped training points in kernel feature space. Our robust KDE replaces the centroid with a robust estimate based on M ..."
Abstract

Cited by 108 (4 self)
 Add to MetaCart
In this paper, we propose a method for robust kernel density estimation. We interpret a KDE with Gaussian kernel as the inner product between a mapped test point and the centroid of mapped training points in kernel feature space. Our robust KDE replaces the centroid with a robust estimate based on Mestimation [1]. The iteratively reweighted least squares (IRWLS) algorithm for Mestimation depends only on inner products, and can therefore be implemented using the kernel trick. We prove the IRWLS method monotonically decreases its objective value at every iteration for a broad class of robust loss functions. Our proposed method is applied to synthetic data and network traffic volumes, and the results compare favorably to the standard KDE. Index Terms — kernel density estimation, Mestimator, outlier, kernel feature space, kernel trick 1.
Theory of classification: A survey of some recent advances
, 2005
"... The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results. ..."
Abstract

Cited by 93 (3 self)
 Add to MetaCart
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.
Trading convexity for scalability
 ICML06, 23rd International Conference on Machine Learning
, 2006
"... Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how nonconvexity can provide scalability advantages over convexity. We show h ..."
Abstract

Cited by 88 (3 self)
 Add to MetaCart
(Show Context)
Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how nonconvexity can provide scalability advantages over convexity. We show how concaveconvex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs. 1.
Consistency of support vector machines and other regularized kernel classifiers
, 2002
"... ..."
(Show Context)
Statistical analysis of some multicategory large margin classification methods
 Journal of Machine Learning Research
, 2004
"... The purpose of this paper is to investigate statistical properties of risk minimization based multicategory classification methods. These methods can be considered as natural extensions of binary large margin classification. We establish conditions that guarantee the consistency of classifiers obtai ..."
Abstract

Cited by 72 (2 self)
 Add to MetaCart
(Show Context)
The purpose of this paper is to investigate statistical properties of risk minimization based multicategory classification methods. These methods can be considered as natural extensions of binary large margin classification. We establish conditions that guarantee the consistency of classifiers obtained in the risk minimization framework with respect to the classification error. Examples are provided for four specific forms of the general formulation, which extend a number of known methods. Using these examples, we show that some risk minimization formulations can also be used to obtain conditional probability estimates for the underlying problem. Such conditional probability information can be useful for statistical inferencing tasks beyond classification. 1.
Fast rates for support vector machines using gaussian kernels
 Ann. Statist
, 2004
"... We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we ..."
Abstract

Cited by 69 (9 self)
 Add to MetaCart
(Show Context)
We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we introduce a new geometric noise condition for distributions that is used to bound the approximation error of Gaussian kernels in terms of their widths. 1
Hilbert Space Embeddings and Metrics on Probability Measures
"... A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseu ..."
Abstract

Cited by 65 (34 self)
 Add to MetaCart
(Show Context)
A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings: we denote this as γk, indexed by the kernel function k that defines the inner product in the RKHS. We present three theoretical properties of γk. First, we consider the question of determining the conditions on the kernel k for which γk is a metric: such k are denoted characteristic kernels. Unlike pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously published conditions may apply only in restricted circumstances (e.g., on compact domains), and are difficult to check, our conditions are straightforward and intuitive: integrally strictly positive definite kernels are characteristic. Alternatively, if a bounded continuous kernel is translationinvariant on R d, then it is characteristic if and only if the support of its Fourier transform is the entire R d.
Vox populi: Collecting highquality labels from a crowd
 In Proceedings of the 22nd Annual Conference on Learning Theory
, 2009
"... With the emergence of search engines and crowdsourcing websites, machine learning practitioners are faced with datasets that are labeled by a large heterogeneous set of teachers. These datasets test the limits of our existing learning theory, which largely assumes that data is sampled i.i.d. from a ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
With the emergence of search engines and crowdsourcing websites, machine learning practitioners are faced with datasets that are labeled by a large heterogeneous set of teachers. These datasets test the limits of our existing learning theory, which largely assumes that data is sampled i.i.d. from a fixed distribution. In many cases, the number of teachers actually scales with the number of examples, with each teacher providing just a handful of labels, precluding any statistically reliable assessment of an individual teacher’s quality. In this paper, we study the problem of pruning lowquality teachers in a crowd, in order to improve the label quality of our training set. Despite the hurdles mentioned above, we show that this is in fact achievable with a simple and efficient algorithm, which does not require that each example be repeatedly labeled by multiple teachers. We provide a theoretical analysis of our algorithm and back our findings with empirical evidence. 1
Learning on the border: Active learning in imbalanced data classification
 In Proc. ACM Conf. on Information and Knowledge Management (CIKM ’07
, 2007
"... This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various realworld classification tasks, such as medical d ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various realworld classification tasks, such as medical diagnosis, text categorization and fraud detection suffer from this phenomenon. The standard machine learning algorithms yield better prediction performance with balanced datasets. In this paper, we demonstrate that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes. We also propose an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset. The proposed method yields an efficient querying system and allows active learning to be applied to very large datasets. Our experimental results show that with an early stopping criteria, active learning achieves a fast solution with competitive prediction performance in imbalanced data classification.