Results 1  10
of
32
Mean SVM for large scale visual classication
 in Proc. IEEE Int’l Conf. on Computer Vision and Pattern Recognition
, 2012
"... PmSVM (Power Mean SVM), a classifier that trains significantly faster than stateoftheart linear and nonlinear SVM solvers in large scale visual classification tasks, is presented. PmSVM also achieves higher accuracies. A scalable learning method for large vision problems, e.g., with millions ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
(Show Context)
PmSVM (Power Mean SVM), a classifier that trains significantly faster than stateoftheart linear and nonlinear SVM solvers in large scale visual classification tasks, is presented. PmSVM also achieves higher accuracies. A scalable learning method for large vision problems, e.g., with millions of examples or dimensions, is a key component in many current vision systems. Recent progresses have enabled linear classifiers to efficiently process such large scale problems. Linear classifiers, however, usually have inferior accuracies in vision tasks. Nonlinear classifiers, on the other hand, may take weeks or even years to train. We propose a power mean kernel and present an efficient learning algorithm through gradient approximation. The power mean kernel family include as special cases many popular additive kernels. Empirically, PmSVM is up to 5 times faster than LIBLINEAR, and two times faster than stateoftheart additive kernel classifiers. In terms of accuracy, it outperforms stateoftheart additive kernel implementations, and has major advantages over linear SVM. 1.
Largescale logistic regression and linear support vector machines using Spark
 in Proceedings of the IEEE International Conference on Big Data
, 2014
"... AbstractLogistic regression and linear SVM are useful methods for largescale classification. However, their distributed implementations have not been well studied. Recently, because of the inefficiency of the MapReduce framework on iterative algorithms, Spark, an inmemory clustercomputing platf ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
AbstractLogistic regression and linear SVM are useful methods for largescale classification. However, their distributed implementations have not been well studied. Recently, because of the inefficiency of the MapReduce framework on iterative algorithms, Spark, an inmemory clustercomputing platform, has been proposed. It has emerged as a popular framework for largescale data processing and analytics. In this work, we consider a distributed Newton method for solving logistic regression as well linear SVM and implement it on Spark. We carefully examine many implementation issues significantly affecting the running time and propose our solutions. After conducting thorough empirical investigations, we release an efficient and easytouse tool for the Spark community.
Largescale Linear RankSVM
"... Linear rankSVM is one of the widely used methods for learning to rank. Although its performance may be inferior to nonlinear methods such as kernel rankSVM and gradient boosting decision trees, linear rankSVM is useful to quickly produce a baseline model. Furthermore, following the recent developmen ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Linear rankSVM is one of the widely used methods for learning to rank. Although its performance may be inferior to nonlinear methods such as kernel rankSVM and gradient boosting decision trees, linear rankSVM is useful to quickly produce a baseline model. Furthermore, following the recent development of linear SVM for classification, linear rankSVM may give competitive performance for large and sparse data. Many existing works have studied linear rankSVM. Their focus is on the computational efficiency when the number of preference pairs is large. In this paper, we systematically study past works, discuss their advantages/disadvantages, and propose an efficient algorithm. Different implementation issues and extensions are discussed with detailed experiments. Finally, we develop a robust linear rankSVM tool for public use. 1
A novel FrankWolfe algorithm. analysis and applications to largescale SVM training. Information Sciences (in press
, 2014
"... Recently, there has been a renewed interest in the machine learning community for variants of a sparse greedy approximation procedure for concave optimization known as the FrankWolfe (FW) method. In particular, this procedure has been successfully applied to train largescale instances of nonline ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Recently, there has been a renewed interest in the machine learning community for variants of a sparse greedy approximation procedure for concave optimization known as the FrankWolfe (FW) method. In particular, this procedure has been successfully applied to train largescale instances of nonlinear Support Vector Machines (SVMs). Specializing FW to SVM training has allowed to obtain efficient algorithms but also important theoretical results, including convergence analysis of training algorithms and new characterizations of model sparsity. In this paper, we present and analyze a novel variant of the FWmethod based on a new way to perform away steps, a classic strategy used to accelerate the convergence of the basic FW procedure. Our formulation and analysis is focused on a general concave maximization problem on the simplex. However, the specialization of our algorithm to quadratic forms is strongly related to some classic methods in computational geometry, namely the Gilbert and MDM algorithms. On the theoretical side, we demonstrate that the method matches the guarantees in terms of convergence rate and number of iterations obtained by using classic away steps. In particular, the method enjoys a linear rate of convergence, a result that has been recently proved for MDM on quadratic forms. On the practical side, we provide experiments on several classification datasets, and evaluate the results using statistical tests. Experiments show that our method is faster than the FW method with classic away steps, and works well even in the cases in which classic away steps slow down the algorithm. Furthermore, these improvements are obtained without sacrificing the predictive accuracy of the obtained SVM model. 1 ar
Distributed boxconstrained quadratic optimization for dual linear svm
 in ICML
, 2015
"... Training machine learning models sometimes needs to be done on large amounts of data that exceed the capacity of a single machine, motivating recent works on developing algorithms that train in a distributed fashion. This paper proposes an efficient boxconstrained quadratic optimization algorith ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Training machine learning models sometimes needs to be done on large amounts of data that exceed the capacity of a single machine, motivating recent works on developing algorithms that train in a distributed fashion. This paper proposes an efficient boxconstrained quadratic optimization algorithm for distributedly training linear support vector machines (SVMs) with large data. Our key technical contribution is an analytical solution to the problem of computing the optimal step size at each iteration, using an efficient method that requires only O(1) communication cost to ensure fast convergence. With this optimal step size, our approach is superior to other methods by possessing global linear convergence, or, equivalently, O(log(1/)) iteration complexity for an accurate solution, for distributedly solving the nonstronglyconvex linear SVM dual problem. Experiments also show that our method is significantly faster than stateoftheart distributed linear SVM algorithms including DSVMAVE, DisDCA and TRON. 1.
Product title classification versus text classification
"... In most ecommerce platforms, product title classification is a crucial task. It can assist sellers listing an item in an appropriate category. At first glance, product title classification is merely an instance of text classification problems, which are wellstudied in literature. However, product ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
In most ecommerce platforms, product title classification is a crucial task. It can assist sellers listing an item in an appropriate category. At first glance, product title classification is merely an instance of text classification problems, which are wellstudied in literature. However, product titles possess some properties very different from general documents. A title is usually a very short description, and an incomplete sentence. A product title classifier may need to be designed differently from a text classifier, although this issue has not been thoroughly studied. In this work, using a largescale realworld data set, we examine conventional textclassification procedures on product title data. These procedures include word stemming, stopword removal, feature representation and multiclass classification. Our major findings include that stemming and stopword removal are harmful, and bigrams or degree2 polynomial mappings are very effective. Further, if linear classifiers such as SVMs are applied, instance normalization does not downgrade the performance and binary/TFIDF representations perform similarly. These results lead to a concrete guideline for practitioners on product title classification. 1
Fast Flux Discriminant for LargeScale Sparse Nonlinear Classification
"... In this paper, we propose a novel supervised learning method, Fast Flux Discriminant (FFD), for largescale nonlinear classification. Compared with other existing methods, FFD has unmatched advantages, as it attains the efficiency and interpretability of linear models as well as the accuracy of no ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a novel supervised learning method, Fast Flux Discriminant (FFD), for largescale nonlinear classification. Compared with other existing methods, FFD has unmatched advantages, as it attains the efficiency and interpretability of linear models as well as the accuracy of nonlinear models. It is also sparse and naturally handles mixed data types. It works by decomposing the kernel density estimation in the entire feature space into selected lowdimensional subspaces. Since there are many possible subspaces, we propose a submodular optimization framework for subspace selection. The selected subspace predictions are then transformed to new features on which a linear model can be learned. Besides, since the transformed features naturally expect nonnegative weights, we only require smooth optimization even with the `1 regularization. Unlike other nonlinear models such as kernel methods, the FFD model is interpretable as it gives importance weights on the original features. Its training and testing are also much faster than traditional kernel models. We carry out extensive empirical studies on realworld datasets and show that the proposed model achieves stateoftheart classification results with sparsity, interpretability, and exceptional scalability. Our model can be learned in minutes on datasets with millions of samples, for which most existing nonlinear methods will be prohibitively expensive in space and time.
Biomedical Semantic Indexing using Dense Word Vectors in BioASQ
"... available at the end of the article Background: Biomedical curators are often required to semantically index large numbers of biomedical articles, using hierarchically related labels (e.g., MeSH headings). Large scale hierarchical classification, a branch of machine learning, can facilitate this pro ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
available at the end of the article Background: Biomedical curators are often required to semantically index large numbers of biomedical articles, using hierarchically related labels (e.g., MeSH headings). Large scale hierarchical classification, a branch of machine learning, can facilitate this procedure, but the resulting automatic classifiers are often inefficient because of the very large dimensionality of the dominant bagofwords representation of texts. Feature selection quickly harms the accuracy of the classifiers in this particular task, and dimensionality reduction transformations (e.g., PCAbased) usually cannot be efficiently applied to very large corpora. Methods: We examine the use of dense word vectors, also known as word embeddings, as an efficient method of dimensionality reduction that makes hierarchical text classification algorithms more scalable in biomedical semantic indexing, without
ptRNApred: computational identification and classification of posttranscriptional RNA
, 2013
"... Noncoding RNAs (ncRNAs) are known to play important functional roles in the cell. However, their identification and recognition in genomic sequences remains challenging. In silico methods, such as classification tools, offer a fast and reliable way for such screening and multiple classifiers have ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Noncoding RNAs (ncRNAs) are known to play important functional roles in the cell. However, their identification and recognition in genomic sequences remains challenging. In silico methods, such as classification tools, offer a fast and reliable way for such screening and multiple classifiers have already been developed to predict welldefined subfamilies of RNA. So far, however, out of all the ncRNAs, only tRNA, miRNA and snoRNA can be predicted with a satisfying sensitivity and specificity. We here present ptRNApred, a tool to detect and classify subclasses of noncoding RNA that are involved in the regulation of posttranscriptional modifications or DNA replication, which we here call posttranscriptional RNA (ptRNA). It (i) detects RNA sequences coding for posttranscriptional RNA from the genomic sequence with an overall sensitivity of 91 % and a specificity of 94 % and (ii) predicts ptRNAsubclasses that exist in eukaryotes: snRNA, snoRNA, RNase P, RNase MRP, Y RNA or telomerase RNA. AVAILABILITY: The ptRNApred software is open for public use on