• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Power Mean SVM for large scale visual classication (2012)

by J Wu
Venue:In CVPR
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 10

Towards good practices for action video encoding

by Jianxin Wu, Yu Zhang, Weiyao Lin - In ICCV , 2013
"... High dimensional representations such as VLAD or FV have shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can achieve further accuracy boost with only negligible com-putational cost. We empirically evaluated various VLAD improvement technologies ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
High dimensional representations such as VLAD or FV have shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can achieve further accuracy boost with only negligible com-putational cost. We empirically evaluated various VLAD improvement technologies to determine good practices in VLAD-based video encoding. Furthermore, we propose an interpretation that VLAD is a maximum entropy linear feature learning process. Combining this new perspective with observed VLAD data distribution properties, we pro-pose a simple, lightweight, but powerful bimodal encod-ing method. Evaluated on 3 benchmark action recognition datasets (UCF101, HMDB51 and Youtube), the bimodal en-coding improves VLAD by large margins in action recogni-tion. 1.
(Show Context)

Citation Context

... domain. Two encodings on top of bimodal can compress 4First row in the third block. HMDB51 and Youtube results are from [24]. For UCF101, we used K = 4000 in bag-of-features and the PmSVM classifier =-=[26]-=- with the histogram intersection kernel (p = −16, C = 0.1 in PmSVM). VLAD with satisfactory results: 1-bit : x ← 1 x≥0−1 x<0 , (5) 2-bit : x ← 1 x>0.4 0.5 0≤x≤0.4 −0.5 −0.4≤x<0 −1 x<−0...

Decomposing Bag of Words Histograms

by Ankit Gandhi, Karteek Alahari, C. V. Jawahar - ICCV 2013- IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION , 2013
"... ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract not found

Object Templates for Visual Place Categorization

by Hao Yang, Jianxin Wu
"... Abstract. The Visual Place Categorization (VPC) problem refers to the categorization of the semantic category of a place using only visual information collected from an autonomous robot. Previous works on this problem only made use of the global configurations observation, such as the Bag-of-Words m ..."
Abstract - Add to MetaCart
Abstract. The Visual Place Categorization (VPC) problem refers to the categorization of the semantic category of a place using only visual information collected from an autonomous robot. Previous works on this problem only made use of the global configurations observation, such as the Bag-of-Words model and spatial pyramid matching. In this paper, we present a novel system solving the problem utilizing both global con-figurations observation and local objects information. To be specific, we propose a local objects classifier that can automatically and effectively select key local objects of a semantic category from randomly sampled patches by the structural similarity support vector machine; and further classify the test frames with the Local Naive Bayes Nearest Neighbors algorithm. We also improve the global configurations observation with histogram intersection codebook and a noisy codewords removal mech-anism. The temporal smoothness of the classification results is ensured by employing a Bayesian filtering framework. Empirically, our system outperforms state-of-the-art methods on two large scale and difficult datasets, demonstrating the superiority of the system. 1
(Show Context)

Citation Context

...pn) totally pn neighborhood vectors. kSTR(xi, xj) is then n(xi) Tn(xj). By transforming the patch descriptor to a neighbor vector, we can apply existing SVM algorithm (such as LIBLINEAR [22] or PmSVM =-=[23]-=-) to solve the structural similarity kernel problem. This kind of SVM is called the structural similarity SVM. It is shown in [7] that the decision values of structural similarity SVM can be used to m...

Exclusive Visual Descriptor Quantization

by Yu Zhang, Jianxin Wu, Weiyao Lin
"... Abstract. Vector quantization (VQ) using exhaustive nearest neighbor (NN) search is the speed bottleneck in classic bag of visual words (BOV) models. Approximate NN (ANN) search methods still cost great time in VQ, since they check multiple regions in the search space to reduce VQ errors. In this pa ..."
Abstract - Add to MetaCart
Abstract. Vector quantization (VQ) using exhaustive nearest neighbor (NN) search is the speed bottleneck in classic bag of visual words (BOV) models. Approximate NN (ANN) search methods still cost great time in VQ, since they check multiple regions in the search space to reduce VQ errors. In this paper, we propose ExVQ, an exclusive NN search method to speed up BOV models. Given a visual descriptor, a portion of search regions is excluded from the whole search space by a linear projection. We ensure that minimal VQ errors are introduced in the exclusion by learning an accurate classifier. Multiple exclusions are organized in a tree structure in ExVQ, whose VQ speed and VQ error rate can be reliably estimated. We show that ExVQ is much faster than state-of-the-art ANN methods in BOV models while maintaining almost the same classification accuracy. In addition, we empirically show that even with the VQ error rate as high as 30%, the classification accuracy of some ANN methods, including ExVQ, is similar to that of exhaustive search (which has zero VQ error). In some cases, ExVQ has even higher classification accuracy than the exhaustive search. 1
(Show Context)

Citation Context

...tree. The trade off parameter in Eq. 3 is α = 0.01. After the BOV representation of all images are created, we use a one-vs-all SVM classifier with the χ2 kernel κ(x1,x2) = ∑d i=1 2x1ix2i x1i+x2i [26]=-=[27]-=- for classification. For each dataset, five training / testing sets are randomly split and average accuracy and VQ time of these five rounds are reported, if not otherwise specified. 4.1 VQ time vs. c...

Random Decision Stumps for Kernel Learning and Efficient SVM

by Gemma Roig, Xavier Boix, Luc Van Gool
"... * Both first authors contributed equally. Abstract. We propose to learn the kernel of an SVM as the weighted sum of a large number of simple, randomized binary stumps. Each stump takes one of the extracted features as input. This leads to an efficient and very fast SVM, while also alleviating the ta ..."
Abstract - Add to MetaCart
* Both first authors contributed equally. Abstract. We propose to learn the kernel of an SVM as the weighted sum of a large number of simple, randomized binary stumps. Each stump takes one of the extracted features as input. This leads to an efficient and very fast SVM, while also alleviating the task of kernel selection. We demonstrate the capabilities of our kernel on 6 standard vision benchmarks, in which we combine several com-mon image descriptors, namely histograms (Flowers17 and Daimler), attribute-like descriptors (UCI, OSR, and a-VOC08), and Sparse Quantization (ImageNet). Results show that our kernel learning adapts well to these different feature types, achieving the performance of kernels specifically tuned for each, and with an evaluation cost similar to that of efficient SVM methods. 1
(Show Context)

Citation Context

...riptors, such as χ2 and RB-χ2 kernels [5,7] or the intersection kernel [20,21]. Other approaches use kernel PCA to linearize the image descriptors [22] or sparse feature embeddings [23]. Recently, Wu =-=[24]-=- introduced the power mean kernel, which generalizes the intersection and χ2 kernels, among others, and achieves a remarkably efficient, scalable SVM optimization. These methods approximate specific f...

mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene Categorization

by Yang Xiao, Jianxin Wu, Junsong Yuan
"... Abstract — mCENTRIST, a new multichannel feature genera-tion mechanism for recognizing scene categories, is proposed in this paper. mCENTRIST explicitly captures the image properties that are encoded jointly by two image channels, which is different from popular multichannel descriptors. In order to ..."
Abstract - Add to MetaCart
Abstract — mCENTRIST, a new multichannel feature genera-tion mechanism for recognizing scene categories, is proposed in this paper. mCENTRIST explicitly captures the image properties that are encoded jointly by two image channels, which is different from popular multichannel descriptors. In order to avoid the curse of dimensionality, tradeoffs at both feature and channel levels have been executed to make mCENTRIST computationally practical. As a result, mCENTRIST is both efficient and easy to implement. In addition, a hyperopponent color space is proposed by embedding Sobel information into the opponent color space for further performance improvements. Experiments show that mCENTRIST outperforms established multichannel descriptors on four RGB and RGB-near infrared data sets, including aerial orthoimagery, indoor, and outdoor scene category recognition tasks. Experiments also verify that the hyper opponent color space enhances descriptors ’ performance effectively. Index Terms — Scene categorization, multi-channel descriptor, CENTRIST, channel interaction, hyper opponent color space.
(Show Context)

Citation Context

...s as indicated in [1]. Here, mGIST’s performance on this dataset is also very limited; • BOV CENTRIST with large size codebook obtains the highest accuracy rate of 47.2% by using PmSVM as reported in =-=[32]-=-. It is possible that applying mCENTRIST in a similar BOV model may lead to better performance; • The hyper opponent color space is advantageous over RGB and opponent color space nearly among all desc...

ACCEPTED BY THE IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Flexible Image Similarity Computation Using Hyper-Spatial Matching

by Yu Zhang, Jianxin Wu, Jianfei Cai, Senior Member, Weiyao Lin
"... Abstract—Spatial pyramid matching (SPM) has been widely used to compute the similarity of two images in computer vision and image processing. While comparing images, SPM implicitly assumes that: in two images from the same category, similar objects will appear in similar locations. However, this is ..."
Abstract - Add to MetaCart
Abstract—Spatial pyramid matching (SPM) has been widely used to compute the similarity of two images in computer vision and image processing. While comparing images, SPM implicitly assumes that: in two images from the same category, similar objects will appear in similar locations. However, this is not always the case. In this paper, we propose hyper-spatial matching (HSM), a more flexible image similarity computing method, to alleviate the mis-matching problem in SPM. Besides the match between corresponding regions, HSM considers the relationship of all spatial pairs in two images, which includes more meaningful match than SPM. We propose two learning strategies to learn SVM models with the proposed HSM kernel in image classification, which are hundreds of times faster than a general purpose SVM solver applied to the HSM kernel (in both training and testing). We compare HSM and SPM on several
(Show Context)

Citation Context

...ve the problem (10), we choose the dual coordinate descent method [39], which is shown in Algorithm 2. We approximate the gradient G (line 5 in Algorithm 2) using the gradient approximation method in =-=[14]-=-. First, we substitute KHSM into G: G = yi N∑ p=1 d∑ t=1 N∑ q=1 Rpq n∑ j=1 αjyjκ(x p,t i , x q,t j )− 1 . (16) It needs O(nN2d) steps to compute, which is very expensive. In Eq. 16, the essential part...

ACCEPTED BY IEEE TNNLS 1 Linear Regression Based Efficient SVM Learning for Large Scale Classification

by Jianxin Wu, Hao Yang
"... Abstract—For large scale classification tasks, especially in the classification of images, additive kernels have shown state-of-the-art accuracy. However, even with the recent development of fast algorithms, learning speed and the ability to handle large scale tasks are still open problems. This pap ..."
Abstract - Add to MetaCart
Abstract—For large scale classification tasks, especially in the classification of images, additive kernels have shown state-of-the-art accuracy. However, even with the recent development of fast algorithms, learning speed and the ability to handle large scale tasks are still open problems. This paper proposes algorithms for large scale SVM classification and other tasks using additive kernels. First, a Linear Regression SVM (LR-SVM) framework for general non-linear kernel is proposed, by using linear regression to approximate gradient computations in the learning process. Second, we propose a Power Mean SVM (PmSVM) algorithm for all additive kernels, by using non-symmetric explanatory variable functions. This non-symmetric kernel approximation has advantages over existing methods: it does not require closed-form Fourier transforms, and it does not require extra training for the approximation either. Compared on benchmark large scale classification datasets with millions of examples or millions of dense feature dimensions, PmSVM has achieved the highest learning speed and highest accuracy among recent algorithms in most cases. Index Terms—Large scale classification, additive kernels, linear regression, SVM, Nyström approximation. I.
(Show Context)

Citation Context

...c. III, we propose a general linear regression based framework, LR-SVM, for SVM learning with general non-linear kernels, which is the generalization of our ACCEPTED BY IEEE TNNLS 2 preliminary works =-=[16]-=- and [17].1 LR-SVM is proposed to approximate the gradient computation in the dual coordinate descent solver, by using a linear regression model. When the explanatory variables are chosen in a symmetr...

Decomposing Bag of Words Histograms

by Ankit G, Hi Karteek Alahari, C. V. Jawahar , 2013
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract - Add to MetaCart
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
(Show Context)

Citation Context

... an optimization problem in Section 2. We also contrast our approach with MRF-based methods, and discuss the latter’s limitation in Section 2. Inspired by the success of fast and scalable classifiers =-=[18, 33, 37]-=-, we discuss our work using linear SVM as an example. We show how the formulation can be generalized to spatiallyconstrained decomposition in Section 3. Section 4 presents an exhaustive evaluation of ...

1mCENTRIST: A Multi-channel Feature Generation Mechanism for Scene Categorization

by Author(s Xiao, Yang Wu, Jianxin Yuan, Yang Xiao, Jianxin Wu Member, Junsong Yuan Member
"... © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to s ..."
Abstract - Add to MetaCart
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at:
(Show Context)

Citation Context

....9(±1.4) 34.2(±2.0) 20.0(±5.9) O1O2O3 40.8(±1.7) 40.6(±1.2) 36.6(±0.4) 33.1(±1.8) 27.2(±1.2) O1O2O3S 44.6(±1.2) 43.2(±1.4) 39.7(±1.6) 35.5(±2.4) 31.5(±1.4) cCENTRIST (HSV) 34.1(±0.9) State-of-the-art =-=[32]-=- 47.2 course, harbor, intersection, medium density residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks and tennis courts (example images in Fig. 8). ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University