Results 1  10
of
44
Struck: Structured Output Tracking with Kernels
"... Adaptive trackingbydetection methods are widely used in computer vision for tracking arbitrary objects. Current approaches treat the tracking problem as a classification task and use online learning techniques to update the object model. However, for these updates to happen one needs to convert th ..."
Abstract

Cited by 112 (4 self)
 Add to MetaCart
(Show Context)
Adaptive trackingbydetection methods are widely used in computer vision for tracking arbitrary objects. Current approaches treat the tracking problem as a classification task and use online learning techniques to update the object model. However, for these updates to happen one needs to convert the estimated object position into a set of labelled training examples, and it is not clear how best to perform this intermediate step. Furthermore, the objective for the classifier (label prediction) is not explicitly coupled to the objective for the tracker (accurate estimation of object position). In this paper, we present a framework for adaptive visual object tracking based on structured output prediction. By explicitly allowing the output space to express the needs of the tracker, we are able to avoid the need for an intermediate classification step. Our method uses a kernelized structured output support vector machine (SVM), which is learned online to provide adaptive tracking. To allow for realtime application, we introduce a budgeting mechanism which prevents the unbounded growth in the number of support vectors which would otherwise occur during tracking. Experimentally, we show that our algorithm is able to outperform stateoftheart trackers on various benchmark videos. Additionally, we show that we can easily incorporate additional features and kernels into our framework, which results in increased performance. 1.
Good Practice in LargeScale Learning for Image Classification
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
, 2013
"... We benchmark several SVM objective functions for largescale image classification. We consider onevsrest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods i ..."
Abstract

Cited by 51 (7 self)
 Add to MetaCart
(Show Context)
We benchmark several SVM objective functions for largescale image classification. We consider onevsrest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods in terms of classification accuracy, but with a significant gain in training speed. Using stochastic gradient descent, we can scale the training to millions of images and thousands of classes. Our experimental evaluation shows that rankingbased algorithms do not outperform the onevsrest strategy when a large number of training examples are used. Furthermore, the gap in accuracy between the different algorithms shrinks as the dimension of the features increases. We also show that learning through crossvalidation the optimal rebalancing of positive and negative examples can result in a significant improvement for the onevsrest strategy. Finally, early stopping can be used as an effective regularization strategy when training with online algorithms. Following these “good practices”, we were able to improve the stateoftheart on a large subset of 10K classes and 9M images of ImageNet from 16.7 % Top1 accuracy to 19.1%.
A quasiNewton approach to nonsmooth convex optimization
 In ICML
, 2008
"... We extend the wellknown BFGS quasiNewton method and its limitedmemory variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: The local quadratic model, the identification of a descent direc ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
We extend the wellknown BFGS quasiNewton method and its limitedmemory variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: The local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We apply the resulting subLBFGS algorithm to L2regularized risk minimization with binary hinge loss, and its directionfinding component to L1regularized risk minimization with logistic loss. In both settings our generic algorithms perform comparable to or better than their counterparts in specialized stateoftheart solvers. 1.
A Sequential Dual Method for Large Scale MultiClass Linear SVMs
, 2008
"... Efficient training of direct multiclass formulations of linear Support Vector Machines is very useful in applications such as text classification with a huge number examples as well as features. This paper presents a fast dual method for this training. The main idea is to sequentially traverse thro ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
(Show Context)
Efficient training of direct multiclass formulations of linear Support Vector Machines is very useful in applications such as text classification with a huge number examples as well as features. This paper presents a fast dual method for this training. The main idea is to sequentially traverse through the training set and optimize the dual variables associated with one example at a time. The speed of training is enhanced further by shrinking and cooling heuristics. Experiments indicate that our method is much faster than state of the art solvers such as bundle, cutting plane and exponentiated gradient methods.
Structured learning for nonsmooth ranking losses
 In SIGKDD Conference
, 2008
"... Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important nondecomposable ranking cr ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
(Show Context)
Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important nondecomposable ranking criteria like AUC (area under ROC curve) and MAP (mean average precision). We propose new, almostlineartime algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain) in the maxmargin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multicriteria maxmargin optimization. The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.
Double updating online learning
 Journal of Machine Learning Research
"... In most kernel based online learning algorithms, when an incoming instance is misclassified, it will be added into the pool of support vectors and assigned with a weight, which often remains unchanged during the rest of the learning process. This is clearly insufficient since when a new support vect ..."
Abstract

Cited by 17 (10 self)
 Add to MetaCart
In most kernel based online learning algorithms, when an incoming instance is misclassified, it will be added into the pool of support vectors and assigned with a weight, which often remains unchanged during the rest of the learning process. This is clearly insufficient since when a new support vector is added, we generally expect the weights of the other existing support vectors to be updated in order to reflect the influence of the added support vector. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short, that explicitly addresses this problem. Instead of only assigning a fixed weight to the misclassified example received at the current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be improved by the proposed online learning method. We conduct an extensive set of empirical evaluations for both binary and multiclass online learning tasks. The experimental results show that the proposed technique is considerably more effective than the stateoftheart online learning algorithms. The source code is available to public at
Optimized cutting plane algorithm for largescale risk minimization
, 2009
"... We have developed an optimized cutting plane algorithm (OCA) for solving largescale risk minimization problems. We prove that the number of iterations OCA requires to converge to a ε precise solution is approximately linear in the sample size. We also derive OCAS, an OCAbased linear binary Support ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
We have developed an optimized cutting plane algorithm (OCA) for solving largescale risk minimization problems. We prove that the number of iterations OCA requires to converge to a ε precise solution is approximately linear in the sample size. We also derive OCAS, an OCAbased linear binary Support Vector Machine (SVM) solver, and OCAM, a linear multiclass SVM solver. In an extensive empirical evaluation we show that OCAS outperforms current stateoftheart SVM solvers like SVMlight, SVMperf and BMRM, achieving speedup factor more than 1,200 over SVMlight on some data sets and speedup factor of 29 over SVMperf, while obtaining the same precise support vector solution. OCAS, even in the early optimization steps, often shows faster convergence than the currently prevailing approximative methods in this domain, SGD and Pegasos. In addition, our proposed linear multiclass SVM solver, OCAM, achieves speedups of factor of up to 10 compared to SVMmulti−class. Finally, we use OCAS and OCAM in two realworld applications, the problem of human acceptor splice site detection and malware detection. Effectively parallelizing OCAS, we achieve stateoftheart results on an acceptor splice site recognition problem only by being able to learn from all the available 50 million examples in a 12milliondimensional feature space. Source code, data sets and scripts to reproduce the experiments are available at
L.: Sequence labelling SVMs trained in one pass
 8 Extracting DrugDrug Interaction
, 2008
"... Abstract. This paper proposes an online solver of the dual formulation of support vector machines for structured output spaces. We apply it to sequence labelling using the exact and greedy inference schemes. In both cases, the persequence training time is the same as a perceptron based on the same ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
Abstract. This paper proposes an online solver of the dual formulation of support vector machines for structured output spaces. We apply it to sequence labelling using the exact and greedy inference schemes. In both cases, the persequence training time is the same as a perceptron based on the same inference procedure, up to a small multiplicative constant. Comparing the two inference schemes, the greedy version is much faster. It is also amenable to higher order Markov assumptions and performs similarly on test. In comparison to existing algorithms, both versions match the accuracies of batch solvers that use exact inference after a single pass over the training examples. 1
Online MultiClass LPBoost
"... Online boosting is one of the most successful online learning algorithms in computer vision. While many challenging online learning problems are inherently multiclass, online boosting and its variants are only able to solve binary tasks. In this paper, we present Online MultiClass LPBoost (OMCLP) ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
Online boosting is one of the most successful online learning algorithms in computer vision. While many challenging online learning problems are inherently multiclass, online boosting and its variants are only able to solve binary tasks. In this paper, we present Online MultiClass LPBoost (OMCLP) which is directly applicable to multiclass problems. From a theoretical point of view, our algorithm tries to maximize the multiclass softmargin of the samples. In order to solve the LP problem in online settings, we perform an efficient variant of online convex programming, which is based on primaldual gradient descentascent update strategies. We conduct an extensive set of experiments over machine learning benchmark datasets, as well as, on Caltech101 category recognition dataset. We show that our method is able to outperform other online multiclass methods. We also apply our method to tracking where, we present an intuitive way to convert the binary tracking by detection problem to a multiclass problem where background patterns which are similar to the target class, become virtual classes. Applying our novel model, we outperform or achieve the stateoftheart results on benchmark tracking videos.
Tree Decomposition for LargeScale SVM Problems: Experimental and Theoretical Results
, 2009
"... To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a data space and trains SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for largescale SVM training ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a data space and trains SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for largescale SVM training. First, it can classify some data points by its own means, thereby reducing the cost of SVM training applied to the remaining data points. Second, it is efficient for seeking the parameter values that maximize the validation accuracy, which helps maintain good test accuracy. Third, we can provide a generalization error bound for the classifier derived by the tree decomposition method. For experiment data sets whose size can be handled by current nonlinear, or kernelbased SVM training techniques, the proposed method can speed up the training by a factor of thousands, and still achieve comparable test accuracy.