Results 1 - 10
of
10
Towards good practices for action video encoding
- In ICCV
, 2013
"... High dimensional representations such as VLAD or FV have shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can achieve further accuracy boost with only negligible com-putational cost. We empirically evaluated various VLAD improvement technologies ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
High dimensional representations such as VLAD or FV have shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can achieve further accuracy boost with only negligible com-putational cost. We empirically evaluated various VLAD improvement technologies to determine good practices in VLAD-based video encoding. Furthermore, we propose an interpretation that VLAD is a maximum entropy linear feature learning process. Combining this new perspective with observed VLAD data distribution properties, we pro-pose a simple, lightweight, but powerful bimodal encod-ing method. Evaluated on 3 benchmark action recognition datasets (UCF101, HMDB51 and Youtube), the bimodal en-coding improves VLAD by large margins in action recogni-tion. 1.
Decomposing Bag of Words Histograms
- ICCV 2013- IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION
, 2013
"... ..."
Object Templates for Visual Place Categorization
"... Abstract. The Visual Place Categorization (VPC) problem refers to the categorization of the semantic category of a place using only visual information collected from an autonomous robot. Previous works on this problem only made use of the global configurations observation, such as the Bag-of-Words m ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. The Visual Place Categorization (VPC) problem refers to the categorization of the semantic category of a place using only visual information collected from an autonomous robot. Previous works on this problem only made use of the global configurations observation, such as the Bag-of-Words model and spatial pyramid matching. In this paper, we present a novel system solving the problem utilizing both global con-figurations observation and local objects information. To be specific, we propose a local objects classifier that can automatically and effectively select key local objects of a semantic category from randomly sampled patches by the structural similarity support vector machine; and further classify the test frames with the Local Naive Bayes Nearest Neighbors algorithm. We also improve the global configurations observation with histogram intersection codebook and a noisy codewords removal mech-anism. The temporal smoothness of the classification results is ensured by employing a Bayesian filtering framework. Empirically, our system outperforms state-of-the-art methods on two large scale and difficult datasets, demonstrating the superiority of the system. 1
Exclusive Visual Descriptor Quantization
"... Abstract. Vector quantization (VQ) using exhaustive nearest neighbor (NN) search is the speed bottleneck in classic bag of visual words (BOV) models. Approximate NN (ANN) search methods still cost great time in VQ, since they check multiple regions in the search space to reduce VQ errors. In this pa ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Vector quantization (VQ) using exhaustive nearest neighbor (NN) search is the speed bottleneck in classic bag of visual words (BOV) models. Approximate NN (ANN) search methods still cost great time in VQ, since they check multiple regions in the search space to reduce VQ errors. In this paper, we propose ExVQ, an exclusive NN search method to speed up BOV models. Given a visual descriptor, a portion of search regions is excluded from the whole search space by a linear projection. We ensure that minimal VQ errors are introduced in the exclusion by learning an accurate classifier. Multiple exclusions are organized in a tree structure in ExVQ, whose VQ speed and VQ error rate can be reliably estimated. We show that ExVQ is much faster than state-of-the-art ANN methods in BOV models while maintaining almost the same classification accuracy. In addition, we empirically show that even with the VQ error rate as high as 30%, the classification accuracy of some ANN methods, including ExVQ, is similar to that of exhaustive search (which has zero VQ error). In some cases, ExVQ has even higher classification accuracy than the exhaustive search. 1
Random Decision Stumps for Kernel Learning and Efficient SVM
"... * Both first authors contributed equally. Abstract. We propose to learn the kernel of an SVM as the weighted sum of a large number of simple, randomized binary stumps. Each stump takes one of the extracted features as input. This leads to an efficient and very fast SVM, while also alleviating the ta ..."
Abstract
- Add to MetaCart
(Show Context)
* Both first authors contributed equally. Abstract. We propose to learn the kernel of an SVM as the weighted sum of a large number of simple, randomized binary stumps. Each stump takes one of the extracted features as input. This leads to an efficient and very fast SVM, while also alleviating the task of kernel selection. We demonstrate the capabilities of our kernel on 6 standard vision benchmarks, in which we combine several com-mon image descriptors, namely histograms (Flowers17 and Daimler), attribute-like descriptors (UCI, OSR, and a-VOC08), and Sparse Quantization (ImageNet). Results show that our kernel learning adapts well to these different feature types, achieving the performance of kernels specifically tuned for each, and with an evaluation cost similar to that of efficient SVM methods. 1
mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene Categorization
"... Abstract — mCENTRIST, a new multichannel feature genera-tion mechanism for recognizing scene categories, is proposed in this paper. mCENTRIST explicitly captures the image properties that are encoded jointly by two image channels, which is different from popular multichannel descriptors. In order to ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — mCENTRIST, a new multichannel feature genera-tion mechanism for recognizing scene categories, is proposed in this paper. mCENTRIST explicitly captures the image properties that are encoded jointly by two image channels, which is different from popular multichannel descriptors. In order to avoid the curse of dimensionality, tradeoffs at both feature and channel levels have been executed to make mCENTRIST computationally practical. As a result, mCENTRIST is both efficient and easy to implement. In addition, a hyperopponent color space is proposed by embedding Sobel information into the opponent color space for further performance improvements. Experiments show that mCENTRIST outperforms established multichannel descriptors on four RGB and RGB-near infrared data sets, including aerial orthoimagery, indoor, and outdoor scene category recognition tasks. Experiments also verify that the hyper opponent color space enhances descriptors ’ performance effectively. Index Terms — Scene categorization, multi-channel descriptor, CENTRIST, channel interaction, hyper opponent color space.
ACCEPTED BY THE IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Flexible Image Similarity Computation Using Hyper-Spatial Matching
"... Abstract—Spatial pyramid matching (SPM) has been widely used to compute the similarity of two images in computer vision and image processing. While comparing images, SPM implicitly assumes that: in two images from the same category, similar objects will appear in similar locations. However, this is ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Spatial pyramid matching (SPM) has been widely used to compute the similarity of two images in computer vision and image processing. While comparing images, SPM implicitly assumes that: in two images from the same category, similar objects will appear in similar locations. However, this is not always the case. In this paper, we propose hyper-spatial matching (HSM), a more flexible image similarity computing method, to alleviate the mis-matching problem in SPM. Besides the match between corresponding regions, HSM considers the relationship of all spatial pairs in two images, which includes more meaningful match than SPM. We propose two learning strategies to learn SVM models with the proposed HSM kernel in image classification, which are hundreds of times faster than a general purpose SVM solver applied to the HSM kernel (in both training and testing). We compare HSM and SPM on several
ACCEPTED BY IEEE TNNLS 1 Linear Regression Based Efficient SVM Learning for Large Scale Classification
"... Abstract—For large scale classification tasks, especially in the classification of images, additive kernels have shown state-of-the-art accuracy. However, even with the recent development of fast algorithms, learning speed and the ability to handle large scale tasks are still open problems. This pap ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—For large scale classification tasks, especially in the classification of images, additive kernels have shown state-of-the-art accuracy. However, even with the recent development of fast algorithms, learning speed and the ability to handle large scale tasks are still open problems. This paper proposes algorithms for large scale SVM classification and other tasks using additive kernels. First, a Linear Regression SVM (LR-SVM) framework for general non-linear kernel is proposed, by using linear regression to approximate gradient computations in the learning process. Second, we propose a Power Mean SVM (PmSVM) algorithm for all additive kernels, by using non-symmetric explanatory variable functions. This non-symmetric kernel approximation has advantages over existing methods: it does not require closed-form Fourier transforms, and it does not require extra training for the approximation either. Compared on benchmark large scale classification datasets with millions of examples or millions of dense feature dimensions, PmSVM has achieved the highest learning speed and highest accuracy among recent algorithms in most cases. Index Terms—Large scale classification, additive kernels, linear regression, SVM, Nyström approximation. I.
Decomposing Bag of Words Histograms
, 2013
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
1mCENTRIST: A Multi-channel Feature Generation Mechanism for Scene Categorization
"... © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to s ..."
Abstract
- Add to MetaCart
(Show Context)
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: