Results 1 -
2 of
2
Beyond gaussian pyramid: Multi-skip feature stacking for action recognition
- In CVPR
, 2015
"... Most state-of-the-art action feature extractors involve differential operators, which act as highpass filters and tend to attenuate low frequency action information. This atten-uation introduces bias to the resulting features and gener-ates ill-conditioned feature matrices. The Gaussian Pyra-mid has ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Most state-of-the-art action feature extractors involve differential operators, which act as highpass filters and tend to attenuate low frequency action information. This atten-uation introduces bias to the resulting features and gener-ates ill-conditioned feature matrices. The Gaussian Pyra-mid has been used as a feature enhancing technique that en-codes scale-invariant characteristics into the feature space in an attempt to deal with this attenuation. However, at the core of the Gaussian Pyramid is a convolutional smooth-ing operation, which makes it incapable of generating new features at coarse scales. In order to address this prob-lem, we propose a novel feature enhancing technique called Multi-skIp Feature Stacking (MIFS), which stacks features extracted using a family of differential filters parameterized with multiple time skips and encodes shift-invariance into the frequency space. MIFS compensates for information lost from using differential operators by recapturing infor-mation at coarse scales. This recaptured information al-lows us to match actions at different speeds and ranges of motion. We prove that MIFS enhances the learnability of differential-based features exponentially. The resulting fea-ture matrices from MIFS have a much smaller conditional numbers and variances than those from conventional meth-ods. Experimental results show significantly improved per-formance on challenging action recognition and event de-tection tasks. Specifically, our method exceeds the state-of-
Beyond Spatial Pyramid Matching: Space-time Extended Descriptor for Action Recognition
"... We address the problem of generating video features for action recognition. The spatial pyramid and its variants have been very popular feature models due to their success in balancing spatial location encoding and spatial invari-ance. Although it seems straightforward to extend spatial pyramid to t ..."
Abstract
- Add to MetaCart
(Show Context)
We address the problem of generating video features for action recognition. The spatial pyramid and its variants have been very popular feature models due to their success in balancing spatial location encoding and spatial invari-ance. Although it seems straightforward to extend spatial pyramid to the temporal domain (spatio-temporal pyramid), the large spatio-temporal diversity of unconstrained videos and the resulting significantly higher dimensional represen-tations make it less appealing. This paper introduces the space-time extended descriptor, a simple but efficient alter-native way to include the spatio-temporal location into the video features. Instead of only coding motion information and leaving the spatio-temporal location to be represented at the pooling stage, location information is used as part of the encoding step. This method is a much more effective and efficient location encoding method as compared to the fixed grid model because it avoids the danger of over committing to artificial boundaries and its dimension is relatively low. Experimental results on several benchmark datasets show that, despite its simplicity, this method achieves compara-ble or better results than spatio-temporal pyramid.