Results 11 - 20
of
89
Robust truncated hinge loss support vector machines. Journal of the American Statistical Association 102 974–983. MR2411659 Seo Young Park Department of Statistics and Operations Research CB3260
, 2007
"... The support vector machine (SVM) has been widely applied for classification problems in both machine learning and statistics. Despite its popularity, however, SVM has some drawbacks in certain situations. In particular, the SVM classifier can be very sensitive to outliers in the training sample. Mor ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The support vector machine (SVM) has been widely applied for classification problems in both machine learning and statistics. Despite its popularity, however, SVM has some drawbacks in certain situations. In particular, the SVM classifier can be very sensitive to outliers in the training sample. Moreover, the number of support vectors (SVs) can be very large in many applications. To circumvent these drawbacks, we propose the robust truncated hinge loss SVM (RSVM), which uses a truncated hinge loss. The RSVM is shown to be more robust to outliers and to deliver more accurate classifiers using a smaller set of SVs than the standard SVM. Our theoretical results show that the RSVM is Fisher-consistent, even when there is no dominating class, a scenario that is particularly challenging for multicategory classification. Similar results are obtained for a class of margin-based classifiers.
Fast low-rank semidefinite programming for embedding and clustering
- in: Eleventh International Conference on Artifical Intelligence and Statistics, AISTATS
, 2007
"... Abstract Many non-convex problems in machine learning such as embedding and clustering have been solved using convex semidefinite relaxations. These semidefinite programs (SDPs) are expensive to solve and are hence limited to run on very small data sets. In this paper we show how we can improve the ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
(Show Context)
Abstract Many non-convex problems in machine learning such as embedding and clustering have been solved using convex semidefinite relaxations. These semidefinite programs (SDPs) are expensive to solve and are hence limited to run on very small data sets. In this paper we show how we can improve the quality and speed of solving a number of these problems by casting them as low-rank SDPs and then directly solving them using a nonconvex optimization algorithm. In particular, we show that problems such as the k-means clustering and maximum variance unfolding (MVU) may be expressed exactly as low-rank SDPs and solved using our approach. We demonstrate that in the above problems our approach is significantly faster, far more scalable and often produces better results compared to traditional SDP relaxation techniques.
A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents
"... Argumentative Zoning (AZ) – analysis of the argumentative structure of a scientific paper – has proved useful for a number of information access tasks. Current approaches to AZ rely on supervised machine learning (ML). Requiring large amounts of annotated data, these approaches are expensive to dev ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
(Show Context)
Argumentative Zoning (AZ) – analysis of the argumentative structure of a scientific paper – has proved useful for a number of information access tasks. Current approaches to AZ rely on supervised machine learning (ML). Requiring large amounts of annotated data, these approaches are expensive to develop and port to different domains and tasks. A potential solution to this problem is to use weaklysupervised ML instead. We investigate the performance of four weakly-supervised classifiers on scientific abstract data annotated for multiple AZ classes. Our best classifier based on the combination of active learning and selftraining outperforms our best supervised classifier, yielding a high accuracy of 81 % when using just 10 % of the labeled data. This result suggests that weakly-supervised learning could be employed to improve the practical applicability and portability of AZ across different information access tasks. 1
Learning with recursive perceptual representations
- in Proc. Adv. Neural Inf. Process. Syst
, 2012
"... Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classification tasks but require high dimensional feature spaces for good performance. Deep learning methods can find more compact representations but current methods emp ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
(Show Context)
Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classification tasks but require high dimensional feature spaces for good performance. Deep learning methods can find more compact representations but current methods employ multilayer perceptrons that require solving a difficult, non-convex optimization problem. We propose a deep non-linear classifier whose layers are SVMs and which incorporates random projection as its core stacking element. Our method learns layers of linear SVMs recursively transforming the original data manifold through a random projection of the weak prediction computed from each layer. Our method scales as linear SVMs, does not rely on any kernel computations or nonconvex optimization, and exhibits better generalization ability than kernel-based SVMs. This is especially true when the number of training samples is smaller than the dimensionality of data, a common scenario in many real-world applications. The use of random projections is key to our method, as we show in the experiments section, in which we observe a consistent improvement over previous –often more complicated – methods on several vision and speech benchmarks. 1
Modeling Disease Progression via Fused Sparse Group Lasso
"... Alzheimer’s Disease (AD) is the most common neurodegenerative disorder associated with aging. Understanding how the disease progresses and identifying related pathological biomarkers for the progression is of primary importance in the clinical diagnosis and prognosis of Alzheimer’s disease. In this ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
(Show Context)
Alzheimer’s Disease (AD) is the most common neurodegenerative disorder associated with aging. Understanding how the disease progresses and identifying related pathological biomarkers for the progression is of primary importance in the clinical diagnosis and prognosis of Alzheimer’s disease. In this paper, we develop novel multi-task learning techniques to predict the disease progression measured by cognitive scores and select biomarkers predictive of the progression. In multi-task learning, the prediction of cognitive scores at each time point is considered as a task, and multiple prediction tasks at different time points are performed simultaneously to capture the temporal smoothness of the prediction models across different time points. Specifically, we propose a novel convex fused sparse group Lasso (cFSGL)
Adaptive Local Dissimilarity Measures for Discriminative Dimension Reduction of Labeled Data
, 2009
"... Due to the tremendous increase of electronic information with respect to the size of data sets as well as their dimension, dimension reduction and visualization of high-dimensional data has become one of the key problems of data mining. Since embedding in lower dimensions necessarily includes a loss ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
Due to the tremendous increase of electronic information with respect to the size of data sets as well as their dimension, dimension reduction and visualization of high-dimensional data has become one of the key problems of data mining. Since embedding in lower dimensions necessarily includes a loss of information, methods to explicitly control the information kept by a specific dimension reduction technique are highly desirable. The incorporation of supervised class information constitutes an important specific case. The aim is to preserve and potentially enhance the discrimination of classes in lower dimensions. In this contribution we use an extension of prototype-based local distance learning, which results in a nonlinear discriminative dissimilarity measure for a given labeled data manifold. The learned local distance measure can be used as basis for other unsupervised dimension reduction techniques, which take into account neighborhood information. We show the combination of different dimension reduction techniques with a discriminative similarity measure learned by an extension of Learning Vector Quantization (LVQ) and their behavior with different parameter settings. The methods are introduced and discussed in terms of artificial and real world data sets.
Margin and radius based multiple kernel learning
- In Proc. of the European Conf. on Machine Learning and Knowledge Discovery in Databases: Part I
, 2009
"... Abstract. A serious drawback of kernel methods, and Support Vector Machines (SVM) in particular, is the difficulty in choosing a suitable kernel function for a given dataset. One of the approaches proposed to address this problem is Multiple Kernel Learning (MKL) in which several kernels are combine ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract. A serious drawback of kernel methods, and Support Vector Machines (SVM) in particular, is the difficulty in choosing a suitable kernel function for a given dataset. One of the approaches proposed to address this problem is Multiple Kernel Learning (MKL) in which several kernels are combined adaptively for a given dataset. Many of the existing MKL methods use the SVM objective function and try to find a linear combination of basic kernels such that the separating margin between the classes is maximized. However, these methods ignore the fact that the theoretical error bound depends not only on the margin, but also on the radius of the smallest sphere that contains all the training instances. We present a novel MKL algorithm that optimizes the error bound taking account of both the margin and the radius. The empirical results show that the proposed method compares favorably with other state-of-the-art MKL methods.
Nonconvex Online Support Vector Machines
"... Abstract—In this paper, we propose a nonconvex online Support Vector Machine (SVM) algorithm (LASVM-NC) based on the Ramp Loss, which has the strong ability of suppressing the influence of outliers. Then, again in the online learning setting, we propose an outlier filtering mechanism (LASVM-I) based ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract—In this paper, we propose a nonconvex online Support Vector Machine (SVM) algorithm (LASVM-NC) based on the Ramp Loss, which has the strong ability of suppressing the influence of outliers. Then, again in the online learning setting, we propose an outlier filtering mechanism (LASVM-I) based on approximating nonconvex behavior in convex optimization. These two algorithms are built upon another novel SVM algorithm (LASVM-G) that is capable of generating accurate intermediate models in its iterative steps by leveraging the duality gap. We present experimental results that demonstrate the merit of our frameworks in achieving significant robustness to outliers in noisy data classification where mislabeled training instances are in abundance. Experimental evaluation shows that the proposed approaches yield a more scalable online SVM algorithm with sparser models and less computational running time, both in the training and recognition phases, without sacrificing generalization performance. We also point out the relation between nonconvex optimization and min-margin active learning. Index Terms—Online learning, nonconvex optimization, support vector machines, active learning. Ç 1
Learning Optimally Sparse Support Vector Machines
"... We show how to train SVMs with an optimal guarantee on the number of support vectors (up to constants), and with sample complexity and training runtime bounds matching the best known for kernel SVM optimization (i.e. without any additional asymptotic cost beyond standard SVM training). Our method is ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We show how to train SVMs with an optimal guarantee on the number of support vectors (up to constants), and with sample complexity and training runtime bounds matching the best known for kernel SVM optimization (i.e. without any additional asymptotic cost beyond standard SVM training). Our method is simple to implement and works well in practice. 1.
Learning Structured Classifiers with Dual Coordinate Ascent
, 2010
"... M. F. and P. A. were supported by the FET programme (EU ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
M. F. and P. A. were supported by the FET programme (EU