• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Feature sampling and partitioning for visual vocabulary generation on large action classification datasets. arXiv preprint arXiv:1405.7545 (2014)

by M Sapienza, F Cuzzolin, P H Torr
Add To MetaCart

Tools

Sorted by:
Results 1 - 2 of 2

Beyond gaussian pyramid: Multi-skip feature stacking for action recognition

by Zhenzhong Lan, Ming Lin, Xuanchong Li, Er G. Hauptmann, Bhiksha Raj - In CVPR , 2015
"... Most state-of-the-art action feature extractors involve differential operators, which act as highpass filters and tend to attenuate low frequency action information. This atten-uation introduces bias to the resulting features and gener-ates ill-conditioned feature matrices. The Gaussian Pyra-mid has ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Most state-of-the-art action feature extractors involve differential operators, which act as highpass filters and tend to attenuate low frequency action information. This atten-uation introduces bias to the resulting features and gener-ates ill-conditioned feature matrices. The Gaussian Pyra-mid has been used as a feature enhancing technique that en-codes scale-invariant characteristics into the feature space in an attempt to deal with this attenuation. However, at the core of the Gaussian Pyramid is a convolutional smooth-ing operation, which makes it incapable of generating new features at coarse scales. In order to address this prob-lem, we propose a novel feature enhancing technique called Multi-skIp Feature Stacking (MIFS), which stacks features extracted using a family of differential filters parameterized with multiple time skips and encodes shift-invariance into the frequency space. MIFS compensates for information lost from using differential operators by recapturing infor-mation at coarse scales. This recaptured information al-lows us to match actions at different speeds and ranges of motion. We prove that MIFS enhances the learnability of differential-based features exponentially. The resulting fea-ture matrices from MIFS have a much smaller conditional numbers and variances than those from conventional meth-ods. Experimental results show significantly improved per-formance on challenging action recognition and event de-tection tasks. Specifically, our method exceeds the state-of-
(Show Context)

Citation Context

...chmid. [40], from which we build on our approaches. Shao et al. [30] proposed to use Laplacian pyramid and Garbor filters to encode videos and get 37.3% accuracy on the HMDB51 dataset. Sapienz et al. =-=[28]-=- explored ways to sub-sample and generate vocabularies for Dense Trajectory features. Jain et al. [8]’s approach incorporated a new motion descriptor. Oneata et al. [22] focused on testing Spatial Fis...

Beyond Spatial Pyramid Matching: Space-time Extended Descriptor for Action Recognition

by Zhenzhong Lan, Er G. Hauptmann
"... We address the problem of generating video features for action recognition. The spatial pyramid and its variants have been very popular feature models due to their success in balancing spatial location encoding and spatial invari-ance. Although it seems straightforward to extend spatial pyramid to t ..."
Abstract - Add to MetaCart
We address the problem of generating video features for action recognition. The spatial pyramid and its variants have been very popular feature models due to their success in balancing spatial location encoding and spatial invari-ance. Although it seems straightforward to extend spatial pyramid to the temporal domain (spatio-temporal pyramid), the large spatio-temporal diversity of unconstrained videos and the resulting significantly higher dimensional represen-tations make it less appealing. This paper introduces the space-time extended descriptor, a simple but efficient alter-native way to include the spatio-temporal location into the video features. Instead of only coding motion information and leaving the spatio-temporal location to be represented at the pooling stage, location information is used as part of the encoding step. This method is a much more effective and efficient location encoding method as compared to the fixed grid model because it avoids the danger of over committing to artificial boundaries and its dimension is relatively low. Experimental results on several benchmark datasets show that, despite its simplicity, this method achieves compara-ble or better results than spatio-temporal pyramid.
(Show Context)

Citation Context

...t, even within a single video sequence, the action area can change among frames. task since on-line videos are subject to large visual diversity. Robust to such variability, the Bag-of-Features (BoF) =-=[23]-=- model has been used as the main paradigm for representing videos. A BoF can be summarized as an encoding step and a pooling step [3]. Traditional pooling discards the local feature position informati...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University