Results 1  10
of
34
Good Practice in LargeScale Learning for Image Classification
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
, 2013
"... We benchmark several SVM objective functions for largescale image classification. We consider onevsrest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods i ..."
Abstract

Cited by 51 (7 self)
 Add to MetaCart
We benchmark several SVM objective functions for largescale image classification. We consider onevsrest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods in terms of classification accuracy, but with a significant gain in training speed. Using stochastic gradient descent, we can scale the training to millions of images and thousands of classes. Our experimental evaluation shows that rankingbased algorithms do not outperform the onevsrest strategy when a large number of training examples are used. Furthermore, the gap in accuracy between the different algorithms shrinks as the dimension of the features increases. We also show that learning through crossvalidation the optimal rebalancing of positive and negative examples can result in a significant improvement for the onevsrest strategy. Finally, early stopping can be used as an effective regularization strategy when training with online algorithms. Following these “good practices”, we were able to improve the stateoftheart on a large subset of 10K classes and 9M images of ImageNet from 16.7 % Top1 accuracy to 19.1%.
Undoing the damage of dataset bias
, 2012
"... The presence of bias in existing object recognition datasets is now wellknown in the computer vision community. While it remains in question whether creating an unbiased dataset is possible given limited resources, in this work we propose a discriminative framework that directly exploits dataset b ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
The presence of bias in existing object recognition datasets is now wellknown in the computer vision community. While it remains in question whether creating an unbiased dataset is possible given limited resources, in this work we propose a discriminative framework that directly exploits dataset bias during training. In particular, our model learns two sets of weights: (1) bias vectors associated with each individual dataset, and (2) visual world weights that are common to all datasets, which are learned by undoing the associated bias from each dataset. The visual world weights are expected to be our best possible approximation to the object model trained on an unbiased dataset, and thus tend to have good generalization ability. We demonstrate the e↵ectiveness of our model by applying the learned weights to a novel, unseen dataset, and report superior results for both classification and detection tasks compared to a classical SVM that does not account for the presence of bias. Overall, we find that it is beneficial to explicitly account for bias when combining multiple datasets.
Hamming Distance Metric Learning
"... Motivated by largescale multimedia applications we propose to learn mappings from highdimensional data to binary codes that preserve semantic similarity. Binary codes are well suited to largescale applications as they are storage efficient and permit exact sublinear kNN search. The framework is ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
(Show Context)
Motivated by largescale multimedia applications we propose to learn mappings from highdimensional data to binary codes that preserve semantic similarity. Binary codes are well suited to largescale applications as they are storage efficient and permit exact sublinear kNN search. The framework is applicable to broad families of mappings, and uses a flexible form of triplet ranking loss. We overcome discontinuous optimization of the discrete mappings by minimizing a piecewisesmooth upper bound on empirical loss, inspired by latent structural SVMs. We develop a new lossaugmented inference algorithm that is quadratic in the code length. We show strong retrieval performance on CIFAR10 and MNIST, with promising classification results using no more than kNN on the binary codes. 1
Designing CategoryLevel Attributes for Discriminative Visual Recognition ∗
"... Attributebased representation has shown great promises for visual recognition due to its intuitive interpretation and crosscategory generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
Attributebased representation has shown great promises for visual recognition due to its intuitive interpretation and crosscategory generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “categorylevel attributes”, which can be efficiently encoded by a compact categoryattribute matrix. The formulation allows us to achieve intuitive and critical design criteria (categoryseparability, learnability) in a principled way. The designed attributes can be used for tasks of crosscategory knowledge transfer, achieving superior performance over wellknown attribute dataset Animals with Attributes (AwA) and a largescale ILSVRC2010 dataset (1.2M images). This approach also leads to stateoftheart performance on the zeroshot learning task on AwA. 1.
Angular Quantizationbased Binary Codes for Fast Similarity Search
"... This paper focuses on the problem of learning binary codes for efficient retrieval of highdimensional nonnegative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the a ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
This paper focuses on the problem of learning binary codes for efficient retrieval of highdimensional nonnegative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the angle between them. In this work, we introduce a novel angular quantizationbased binary coding (AQBC) technique for such data and analyze its properties. In its most basic form, AQBC works by mapping each nonnegative feature vector onto the vertex of the binary hypercube with which it has the smallest angle. Even though the number of vertices (quantization landmarks) in this scheme grows exponentially with data dimensionality d, we propose a method for mapping feature vectors to their smallestangle binary vertices that scales as O(d log d). Further, we propose a method for learning a linear transformation of the data to minimize the quantization error, and show that it results in improved binary codes. Experiments on image and text datasets show that the proposed AQBC method outperforms the state of the art. 1
Efficient Discriminative Projections for Compact Binary Descriptors
"... Abstract. Binary descriptors of image patches are increasingly popular given that they require less storage and enable faster processing. This, however, comes at a price of lower recognition performances. To boost these performances, we project the image patches to a more discriminative subspace, an ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Binary descriptors of image patches are increasingly popular given that they require less storage and enable faster processing. This, however, comes at a price of lower recognition performances. To boost these performances, we project the image patches to a more discriminative subspace, and threshold their coordinates to build our binary descriptor. However, applying complex projections to the patches is slow, which negates some of the advantages of binary descriptors. Hence, our key idea is to learn the discriminative projections so that they can be decomposed into a small number of simple filters for which the responses can be computed fast. We show that with as few as 32 bits per descriptor we outperform the stateoftheart binary descriptors in terms of both accuracy and efficiency. 1
Active Learning for Sparse Bayesian Multilabel Classification
"... We study the problem of active learning for multilabel classification. We focus on the realworld scenario where the average number of positive (relevant) labels per data point is small leading to positive label sparsity. Carrying out mutual information based nearoptimal active learning in this s ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of active learning for multilabel classification. We focus on the realworld scenario where the average number of positive (relevant) labels per data point is small leading to positive label sparsity. Carrying out mutual information based nearoptimal active learning in this setting is a challenging task since the computational complexity involved is exponential in the total number of labels. We propose a novel inference algorithm for the sparse Bayesian multilabel model of [17]. The benefit of this alternate inference scheme is that it enables a natural approximation of the mutual information objective. We prove that the approximation leads to an identical solution to the exact optimization problem but at a fraction of the optimization cost. This allows us to carry out efficient, nonmyopic, and nearoptimal active learning for sparse multilabel classification. Extensive experiments reveal the effectiveness of the method.
Discriminative probabilistic prototype learning
"... In this paper we propose a simple yet powerful method for learning representations in supervised learning scenarios where an input datapoint is described by a set of feature vectors and its associated output may be given by soft labels indicating, for example, class probabilities. We represent an ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In this paper we propose a simple yet powerful method for learning representations in supervised learning scenarios where an input datapoint is described by a set of feature vectors and its associated output may be given by soft labels indicating, for example, class probabilities. We represent an input datapoint as a Kdimensional vector, where each component is a mixture of probabilities over its corresponding set of feature vectors. Each probability indicates how likely a feature vector is to belong to oneoutofK unknown prototype patterns. We propose a probabilistic model that parameterizes these prototype patterns in terms of hidden variables and therefore it can be trained with conventional approaches based on likelihood maximization. More importantly, both the model parameters and the prototype patterns can be learned from data in a discriminative way. We show that our model can be seen as a probabilistic generalization of learning vector quantization (LVQ). We apply our method to the problems of shape classification, hyperspectral imaging classification and people’s work class categorization, showing the superior performance of our method compared to the standard prototypebased classification approach and other competitive benchmarks. 1.
Classemes and Other Classifierbased Features for Efficient Object Categorization
 SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... This paper describes compact image descriptors enabling accurate object categorization with linear classification models, which offer the advantage of being efficient to both train and test. The shared property of our descriptors is the use of classifiers to produce the features of each image. Intui ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper describes compact image descriptors enabling accurate object categorization with linear classification models, which offer the advantage of being efficient to both train and test. The shared property of our descriptors is the use of classifiers to produce the features of each image. Intuitively, these classifiers evaluate the presence of a set of basis classes inside the image. We first propose to train the basis classifiers as recognizers of a handselected set of object classes. We then demonstrate that better accuracy can be achieved by learning the basis classes as “abstract categories” collectively optimized as features for linear classification. Finally, we describe several strategies to aggregate the outputs of basis classifiers evaluated on multiple subwindows of the image in order to handle cases when the photo contains multiple objects and large amounts of clutter. We test our descriptors on challenging benchmarks of object categorization and detection, using a simple linear SVM as classifier. Our results are on par with those achieved by the best systems in these fields but are produced at orders of magnitude lower computational costs and using an image representation that is general and not specifically tuned for a predefined set of test classes.
Fast Exact Search in Hamming Space with MultiIndex Hashing
, 2012
"... There has been growing interest in mapping image data onto compact binary codes for fast near neighbor search in vision applications. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used in this way, as it was ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
There has been growing interest in mapping image data onto compact binary codes for fast near neighbor search in vision applications. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used in this way, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact Knearest neighbor search in Hamming space. The algorithm is straightforward to implement, storage efficient, and it has sublinear runtime behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline and for datasets with up to one billion items, 64 or 128bit codes, and search radii up to almost 25 bits.