Results 1 - 10
of
10
BudgetedSVM: A toolbox for scalable SVM approximations
- Journal of Machine Learning Research 14:3813–3817
, 2014
"... We present BudgetedSVM, an open-source C++ toolbox comprising highly-optimized implemen-tations of recently proposed algorithms for scalable training of Support Vector Machine (SVM) ap-proximators: Adaptive Multi-hyperplane Machines, Low-rank Linearization SVM, and Budgeted Stochastic Gradient Desce ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
We present BudgetedSVM, an open-source C++ toolbox comprising highly-optimized implemen-tations of recently proposed algorithms for scalable training of Support Vector Machine (SVM) ap-proximators: Adaptive Multi-hyperplane Machines, Low-rank Linearization SVM, and Budgeted Stochastic Gradient Descent. BudgetedSVM trains models with accuracy comparable to LibSVM in time comparable to LibLinear, solving non-linear problems with millions of high-dimensional examples within minutes on a regular computer. We provide command-line and Matlab interfaces to BudgetedSVM, an efficient API for handling large-scale, high-dimensional data sets, as well as detailed documentation to help developers use and further extend the toolbox.
A Divide-and-Conquer Solver for Kernel Support Vector Machines
"... The kernel support vector machine (SVM) is one of the most widely used classification methods; however, the amount of computation required be-comes the bottleneck when facing millions of samples. In this paper, we propose and ana-lyze a novel divide-and-conquer solver for ker-nel SVMs (DC-SVM). In t ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
The kernel support vector machine (SVM) is one of the most widely used classification methods; however, the amount of computation required be-comes the bottleneck when facing millions of samples. In this paper, we propose and ana-lyze a novel divide-and-conquer solver for ker-nel SVMs (DC-SVM). In the division step, we partition the kernel SVM problem into smaller subproblems by clustering the data, so that each subproblem can be solved independently and ef-ficiently. We show theoretically that the sup-port vectors identified by the subproblem solu-tion are likely to be support vectors of the entire kernel SVM problem, provided that the problem
Non-linear Label Ranking for Large-scale Prediction of Long-Term User Interests
, 2014
"... We consider the problem of personalization of online services from the viewpoint of ad targeting, where we seek to find the best ad categories to be shown to each user, resulting in improved user experience and increased advertisers ’ revenue. We propose to address this problem as a task of ranking ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We consider the problem of personalization of online services from the viewpoint of ad targeting, where we seek to find the best ad categories to be shown to each user, resulting in improved user experience and increased advertisers ’ revenue. We propose to address this problem as a task of ranking the ad categories depending on a user’s preference, and introduce a novel label ranking approach capable of efficiently learning non-linear, highly accurate models in large-scale settings. Ex-periments on real-world advertising data set with more than 3.2 million users show that the proposed algorithm outper-forms the existing solutions in terms of both rank loss and top-K retrieval performance, strongly suggesting the benefit of using the proposed model on large-scale ranking problems.
Large-Margin Convex Polytope Machine
"... Abstract We present the Convex Polytope Machine (CPM), a novel non-linear learning algorithm for large-scale binary classification tasks. The CPM finds a large margin convex polytope separator which encloses one class. We develop a stochastic gradient descent based algorithm that is amenable to mas ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract We present the Convex Polytope Machine (CPM), a novel non-linear learning algorithm for large-scale binary classification tasks. The CPM finds a large margin convex polytope separator which encloses one class. We develop a stochastic gradient descent based algorithm that is amenable to massive datasets, and augment it with a heuristic procedure to avoid sub-optimal local minima. Our experimental evaluations of the CPM on large-scale datasets from distinct domains (MNIST handwritten digit recognition, text topic, and web security) demonstrate that the CPM trains models faster, sometimes several orders of magnitude, than state-ofthe-art similar approaches and kernel-SVM methods while achieving comparable or better classification performance. Our empirical results suggest that, unlike prior similar approaches, we do not need to control the number of sub-classifiers (sides of the polytope) to avoid overfitting.
Big data algorithms for visualization and supervised learning
, 2014
"... Explosive growth in data size, data complexity, and data rates, triggered by emergence of high-throughput technologies such as remote sensing, crowd-sourcing, social networks, or computational advertising, in recent years has led to an increasing avail-ability of data sets of unprecedented scales, w ..."
Abstract
- Add to MetaCart
Explosive growth in data size, data complexity, and data rates, triggered by emergence of high-throughput technologies such as remote sensing, crowd-sourcing, social networks, or computational advertising, in recent years has led to an increasing avail-ability of data sets of unprecedented scales, with billions of high-dimensional data examples stored on hundreds of terabytes of memory. In order to make use of this large-scale data and extract useful knowledge, researchers in machine learning and data mining communities are faced with numerous challenges, since the data mining and machine learning tools designed for standard desktop computers are not capable of addressing these problems due to memory and time constraints. As a result, there exists an evident need for development of novel, scalable algorithms for big data. In this thesis we address these important problems, and propose both supervised and unsupervised tools for handling large-scale data. First, we consider unsupervised approach to big data analysis, and explore scalable, efficient visualization method that allows fast knowledge extraction. Next, we consider supervised learning setting
Online Sequential Projection Vector Machine with Adaptive Data Mean Update
"... We propose a simple online learning algorithm especial for high-dimensional data. The algorithm is referred to as online sequential projection vector machine (OSPVM) which derives from projection vector machine and can learn from data in one-by-one or chunk-by-chunk mode. In OSPVM, data centering, ..."
Abstract
- Add to MetaCart
(Show Context)
We propose a simple online learning algorithm especial for high-dimensional data. The algorithm is referred to as online sequential projection vector machine (OSPVM) which derives from projection vector machine and can learn from data in one-by-one or chunk-by-chunk mode. In OSPVM, data centering, dimension reduction, and neural network training are integrated seamlessly. In particular, the model parameters including (1) the projection vectors for dimension reduction, (2) the input weights, biases, and output weights, and (3) the number of hidden nodes can be updated simultaneously. Moreover, only one parameter, the number of hidden nodes, needs to be determined manually, and this makes it easy for use in real applications. Performance comparison was made on various high-dimensional classification problems for OSPVM against other fast online algorithms including budgeted stochastic gradient descent (BSGD) approach, adaptive multihyperplane machine (AMM), primal estimated subgradient solver (Pegasos), online sequential extreme learning machine (OSELM), and SVD + OSELM (feature selection based on SVD is performed before OSELM). The results obtained demonstrated the superior generalization performance and efficiency of the OSPVM.
The
"... Online algorithms that process one example at a time are advantageous when dealing with very large data or with data streams. Stochastic Gradient Descent (SGD) is such an algorithm and it is an attractive choice for online Support Vector Machine (SVM) training due to its simplicity and effectiveness ..."
Abstract
- Add to MetaCart
Online algorithms that process one example at a time are advantageous when dealing with very large data or with data streams. Stochastic Gradient Descent (SGD) is such an algorithm and it is an attractive choice for online Support Vector Machine (SVM) training due to its simplicity and effectiveness. When equipped with kernel functions, similarly to other SVM learning algorithms, SGD is susceptible to the curse of kernelization that causes unbounded linear growth in model size and update time with data size. This may render SGD inapplicable to large data sets. We address this issue by presenting a class of Budgeted SGD (BSGD) algorithms for large-scale kernel SVM training which have constant space and constant time complexity per update. Specifically, BSGD keeps the number of support vectors bounded during training through several budget maintenance strategies. We treat the budget maintenance as a source of the gradient error, and show that the gap between the BSGD and the optimal SVM solutions depends on the model degradation due to budget maintenance. To minimize the gap, we study greedy budget maintenance methods based on removal, projection, and merging of support vectors. We propose budgeted versions of several popular online SVM algorithms that belong to the SGD family. We further derive BSGD algorithms for multi-class SVM training. Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction.
Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification
"... This paper presents a novel algorithm which uses hash bits for efficiently optimizing non-linear kernel SVM in very large scale visual classification problems. Our key idea is to represent each sample with compact hash bits and define an inner product over these bits, which serves as the sur-rogate ..."
Abstract
- Add to MetaCart
(Show Context)
This paper presents a novel algorithm which uses hash bits for efficiently optimizing non-linear kernel SVM in very large scale visual classification problems. Our key idea is to represent each sample with compact hash bits and define an inner product over these bits, which serves as the sur-rogate of the original nonlinear kernels. Then the problem of solving the nonlinear SVM can be transformed into solv-ing a linear SVM over the hash bits. The proposed Hash-SVM enjoys both greatly reduced data storage owing to the compact binary representation, as well as the (sub-)linear training complexity via linear SVM. As a crucial component of Hash-SVM, we propose a novel hashing scheme for arbi-trary non-linear kernels via random subspace projection in reproducing kernel Hilbert space. Our comprehensive anal-ysis reveals a well behaved theoretic bound of the deviation between the proposed hashing-based kernel approximation and the original kernel function. We also derived moderate requirements on the hash bits for achieving a satisfactory accuracy level. Several experiments on large-scale visu-al classification benchmarks are conducted, including one with over 1 million images. The results well demonstrated the superiority of our algorithm when compared with other alternatives. 1.
Non-linear Label Ranking for Large-scale Prediction of Long-Term User Interests
"... We consider the problem of personalization of online services from the viewpoint of ad targeting, where we seek to find the best ad categories to be shown to each user, resulting in improved user experience and increased advertisers ’ revenue. We propose to address this problem as a task of ranking ..."
Abstract
- Add to MetaCart
(Show Context)
We consider the problem of personalization of online services from the viewpoint of ad targeting, where we seek to find the best ad categories to be shown to each user, resulting in improved user experience and increased advertisers ’ revenue. We propose to address this problem as a task of ranking the ad categories depending on a user’s preference, and introduce a novel label ranking approach capable of efficiently learn-ing non-linear, highly accurate models in large-scale settings. Experiments on a real-world advertising data set with more than 3.2 million users show that the proposed algorithm out-performs the existing solutions in terms of both rank loss and top-K retrieval performance, strongly suggesting the benefit of using the proposed model on large-scale ranking problems.
Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification
"... This paper presents a novel algorithm which uses com-pact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification prob-lems. Our key idea is to represent each sample with compact hash bits, over which an inner product is defined to serve as the ..."
Abstract
- Add to MetaCart
(Show Context)
This paper presents a novel algorithm which uses com-pact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification prob-lems. Our key idea is to represent each sample with compact hash bits, over which an inner product is defined to serve as the surrogate of the original nonlinear kernels. Then the problem of solving the nonlinear SVM can be transformed into solving a linear SVM over the hash bits. The proposed Hash-SVM enjoys dramatic storage cost reduction owing to the compact binary representation, as well as a (sub-)linear training complexity via linear SVM. As a critical component of Hash-SVM, we propose a novel hashing scheme for arbi-trary non-linear kernels via random subspace projection in reproducing kernel Hilbert space. Our comprehensive anal-ysis reveals a well behaved theoretic bound of the deviation between the proposed hashing-based kernel approximation and the original kernel function. We also derive require-ments on the hash bits for achieving a satisfactory accuracy level. Several experiments on large-scale visual classifica-tion benchmarks are conducted, including one with over 1 million images. The results show that Hash-SVM greatly reduces the computational complexity (more than ten times faster in many cases) while keeping comparable accuracies. 1.