DMCA
Multi-view super vector for action recognition (2014)
Venue: | in Proc. IEEE Conf. CVPR |
Citations: | 8 - 4 self |
Citations
3723 | Histograms of oriented gradients for human detection, in
- Dalal, Triggs
(Show Context)
Citation Context ...rogresses have been made to improve the accuracy of action recognition system. These progresses should be partly ascribed to the development of more elaborately designed low-level feature descriptors =-=[5, 19, 6]-=-, sampling strategy [33, 12] and more sophisticated developed models for generating video representations [23, 14]. To enhance recognition performance, several kinds of descriptors have been proposed,... |
1606 | Video Google: A text retrieval approach to object matching in videos
- Sivic, Zisserman
- 2003
(Show Context)
Citation Context ... statistics of the distribution model. Finally, this vector representation is fed into the classifier to produce recognition result. The Bag-of-Visual-Words (BoVW) representation is a classic example =-=[27]-=- that captures zero-th order statistics from, and implements video encoding based on the distribution of visual vocabulary. More sophisticated representations beyond BoVW have also been proposed to de... |
877 | Hierarchical mixtures of experts and the EM algorithm
- Jordan, Jacobs
- 1994
(Show Context)
Citation Context ...ear submodels. The effectiveness of mixture model based methods has been demonstrated in various models such as Mixture of Probabilistic Principal Component Analysis (M-PPCA) [31], mixture of experts =-=[15]-=-, mixture of Factor analysis [9] etc. In the following section, we present our Mixture model of Probabilistic Canonical Correlation Analyzer (M-PCCA) and its corresponding learning algorithm. 3.1. Mod... |
817 | On space-time interest points
- Laptev
- 2005
(Show Context)
Citation Context ...es cover1 age. ω-trajectory [12] and dense trajectory [33] are examples of practical feature sampling strategies that enhance recognition performance beyond the Spatio Temporal Interest Points (STIP) =-=[18]-=-. A standard pipeline of video encoding firstly fit the features distribution model based on the local descriptors extracted from training videos. From each new video, local features are extracted and... |
757 | Recognizing human actions: A local SVM approach
- Schuldt, Laptev, et al.
- 2004
(Show Context)
Citation Context ... has been an active research area due to its wide applications [1, 33, 34, 36]. Early research focus had been on datasets with limited size and relatively controlled settings, such as the KTH dataset =-=[26]-=-, but later shifted to large and more realistic datasets such as the HMDB51 dataset [16] and UCF101 dataset [28]. These uncontrolled video datasets pose great challenges to the recognition task, e.g. ... |
735 | Learning realistic human actions from movies
- Laptev, Marszalek, et al.
- 2008
(Show Context)
Citation Context ...rogresses have been made to improve the accuracy of action recognition system. These progresses should be partly ascribed to the development of more elaborately designed low-level feature descriptors =-=[5, 19, 6]-=-, sampling strategy [33, 12] and more sophisticated developed models for generating video representations [23, 14]. To enhance recognition performance, several kinds of descriptors have been proposed,... |
716 | Behavior recognition via sparse spatio-temporal features
- Dollár, Rabaud, et al.
- 2005
(Show Context)
Citation Context ...ect feature [5, 19, 33, 22]. In particular, HOG descriptor characterizes static appearance [5], while HOF and MBH descriptors capture dynamic motion [19, 6]. So far, local spatio-temporal descriptors =-=[8]-=- have exhibited successful performance [1] in action recognition task. However, outstanding recognition systems rarely rely on only a single type of them. A complementary line of work focuses on local... |
706 | Probabilistic principal component analysis
- Tipping, Bishop
- 1999
(Show Context)
Citation Context ... the mixture of local linear submodels. The effectiveness of mixture model based methods has been demonstrated in various models such as Mixture of Probabilistic Principal Component Analysis (M-PPCA) =-=[31]-=-, mixture of experts [15], mixture of Factor analysis [9] etc. In the following section, we present our Mixture model of Probabilistic Canonical Correlation Analyzer (M-PCCA) and its corresponding lea... |
520 | VLFeat: An open and portable library of computer vision algorithms,” http://www.vlfeat.org
- Vedaldi, Fulkerson
- 2008
(Show Context)
Citation Context ... whiten each types of descriptors as is suggested in [13]. The resulting descriptors are then L2-normalized. A total of 256,000 descriptors are randomly sampled from training sets. The VLFeat Toolbox =-=[32]-=- is used to implement baseline methods. In particular, we employ the built-in FV and VLAD implementations in the experiments. 5.3. Evaluation of MVSV representation To specify the number of components... |
445 | Natural gradient works efficiently in learning
- Amari
- 1998
(Show Context)
Citation Context ... of different local descriptors to improve recognition accuracy. We first derive an EM algorithm for learning M-PCCA. Each video is encoded based on this MPCCA via latent space and gradient embedding =-=[11]-=-. As we shall see, the resulting video representation is consisted of two components: one is the latent factors, which encodes information shared by different feature descriptors; the other is the gra... |
355 | T.Mensink, Improving the Fisher Kernel for large-scale image classification
- Perronnin, Sánchez
(Show Context)
Citation Context ... each video, three types of features are extracted with the same setup as [33], namely Histogram of Oriented Gradient (HOG), Histogram of Optical Flow (HOF) and Motion Boundary Histogram (MBH). As in =-=[24]-=-, we separately apply PCA on each type of descriptor to reduce their dimensions by a factor of two. We also whiten each types of descriptors as is suggested in [13]. The resulting descriptors are then... |
351 | Canonical Correlation Analysis: An overview with application to learning methods
- Hardoon, Szedmak, et al.
(Show Context)
Citation Context ...the paper with a discussion on the limitation and possible extension of the method. 2. Canonical Correlation Analysis revisited In this section, we briefly review Canonical Correlation Analysis (CCA) =-=[10]-=- and its probabilistic extension, Probabilistic Canonical Correlation Analysis (PCCA) [3]. For two sets of data, X = {xi}Ni=1 and Y = {yi}Ni=1 with dimensions n and m respectively, CCA [10] manage to ... |
292 | Action recognition by dense trajectories
- Wang, Kläser, et al.
- 2011
(Show Context)
Citation Context ... results, and outperforms FV and VLAD with descriptor concatenation or kernel average fusion strategy. 1. Introduction Action recognition has been an active research area due to its wide applications =-=[1, 33, 34, 36]-=-. Early research focus had been on datasets with limited size and relatively controlled settings, such as the KTH dataset [26], but later shifted to large and more realistic datasets such as the HMDB5... |
280 | Human detection using oriented histograms of flow and appearance
- Dalal, Triggs, et al.
(Show Context)
Citation Context ...rogresses have been made to improve the accuracy of action recognition system. These progresses should be partly ascribed to the development of more elaborately designed low-level feature descriptors =-=[5, 19, 6]-=-, sampling strategy [33, 12] and more sophisticated developed models for generating video representations [23, 14]. To enhance recognition performance, several kinds of descriptors have been proposed,... |
218 | Aggregating local descriptors into a compact image representation
- Jégou, Douze, et al.
(Show Context)
Citation Context ...ribed to the development of more elaborately designed low-level feature descriptors [5, 19, 6], sampling strategy [33, 12] and more sophisticated developed models for generating video representations =-=[23, 14]-=-. To enhance recognition performance, several kinds of descriptors have been proposed, each of which describes certain aspects of object feature [5, 19, 33, 22]. In particular, HOG descriptor characte... |
214 | Human activity analysis: A review
- Aggarwal, Ryoo
(Show Context)
Citation Context ... results, and outperforms FV and VLAD with descriptor concatenation or kernel average fusion strategy. 1. Introduction Action recognition has been an active research area due to its wide applications =-=[1, 33, 34, 36]-=-. Early research focus had been on datasets with limited size and relatively controlled settings, such as the KTH dataset [26], but later shifted to large and more realistic datasets such as the HMDB5... |
208 | Fisher kernels on visual vocabularies for image categorization
- Perronnin, Dance
(Show Context)
Citation Context ...ribed to the development of more elaborately designed low-level feature descriptors [5, 19, 6], sampling strategy [33, 12] and more sophisticated developed models for generating video representations =-=[23, 14]-=-. To enhance recognition performance, several kinds of descriptors have been proposed, each of which describes certain aspects of object feature [5, 19, 33, 22]. In particular, HOG descriptor characte... |
157 | HMDB: a large video database for human motion recognition
- Kuehne, Jhuang, et al.
- 2011
(Show Context)
Citation Context ...search focus had been on datasets with limited size and relatively controlled settings, such as the KTH dataset [26], but later shifted to large and more realistic datasets such as the HMDB51 dataset =-=[16]-=- and UCF101 dataset [28]. These uncontrolled video datasets pose great challenges to the recognition task, e.g. large amount of intra-class variations, background clutter and occlusion, camera motions... |
102 | A probabilistic interpretation of canonical correlation analysis
- Bach, Jordan
- 2006
(Show Context)
Citation Context ...onical Correlation Analysis revisited In this section, we briefly review Canonical Correlation Analysis (CCA) [10] and its probabilistic extension, Probabilistic Canonical Correlation Analysis (PCCA) =-=[3]-=-. For two sets of data, X = {xi}Ni=1 and Y = {yi}Ni=1 with dimensions n and m respectively, CCA [10] manage to find a series of linear projections that maximize the correlation between two projected v... |
82 |
Kernel and nonlinear canonical correlation analysis.
- Lai, Fyfe
- 2000
(Show Context)
Citation Context ...ixture model of Probabilistic CCA One limitation of PCCA is that it can only deal with linear projections. This has naturally motivated researchers to develop nonlinear CCA. One example is kernel CCA =-=[17]-=-. An alternative paradigm to simultaneously model nonlinear structures and deal with local correlation is to introduce the mixture of local linear submodels. The effectiveness of mixture model based m... |
75 | UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402,
- Soomro, Zamir, et al.
- 2012
(Show Context)
Citation Context ... datasets with limited size and relatively controlled settings, such as the KTH dataset [26], but later shifted to large and more realistic datasets such as the HMDB51 dataset [16] and UCF101 dataset =-=[28]-=-. These uncontrolled video datasets pose great challenges to the recognition task, e.g. large amount of intra-class variations, background clutter and occlusion, camera motions and view ∗Corresponding... |
43 | Better exploiting motion for better action recognition.
- Jain, Jégou, et al.
- 2013
(Show Context)
Citation Context ...prove the accuracy of action recognition system. These progresses should be partly ascribed to the development of more elaborately designed low-level feature descriptors [5, 19, 6], sampling strategy =-=[33, 12]-=- and more sophisticated developed models for generating video representations [23, 14]. To enhance recognition performance, several kinds of descriptors have been proposed, each of which describes cer... |
39 | Action and Event Recognition with Fisher Vectors on a Compact Feature Set. In
- Oneata, Verbeek, et al.
- 2013
(Show Context)
Citation Context ...stics of features. Among them, FV [23] and its variant VLAD [14], initially designed for image classification, have been shown to achieve promising performances in several action recognition datasets =-=[37, 21, 12]-=-. Recent studies show that combining multiple types of local descriptors can improve recognition performance e.g. [33, 12]. Combining methods can be roughly grouped into three types, namely descriptor... |
37 | Multimodal feature fusion for robust event detection in web videos. In:
- Natarajan, Wu, et al.
- 2012
(Show Context)
Citation Context ...re-level fusion, which trains classifiers for each descriptor and fuses the confidence scores [39, 30, 38]. All these methods have been extensively evaluated in the context of complex event detection =-=[29, 20]-=-. Descriptor-level fusion and kernel average are widely applied in action recognition [12, 33]. When the adopted descriptors have strong dependency, descriptor fusion is probably better, because the c... |
35 |
All about VLAD
- Arandjelovic, Zisserman
(Show Context)
Citation Context ... = {Z,Gx,Gy} (19) This representation, with a dimension of K(d+ n+m), is firstly power-normalized and L2-normalized as is suggested in [24]. We then apply intra-normalization for each component as in =-=[2]-=-. 4.2. Relation to previous methods We concatenate the estimations of latent vector for each submodel to recover the shared information. This is closely related to VLAD [14], a simplied version of FV ... |
30 |
Evaluation of low-level features and their combinations for complex event detection,”
- Tamrakar, Ali, et al.
- 2012
(Show Context)
Citation Context ...re-level fusion, which trains classifiers for each descriptor and fuses the confidence scores [39, 30, 38]. All these methods have been extensively evaluated in the context of complex event detection =-=[29, 20]-=-. Descriptor-level fusion and kernel average are widely applied in action recognition [12, 33]. When the adopted descriptors have strong dependency, descriptor fusion is probably better, because the c... |
27 | A comparative study of encoding, pooling and normalization methods for action recognition
- Wang, Wang, et al.
- 2012
(Show Context)
Citation Context ...stics of features. Among them, FV [23] and its variant VLAD [14], initially designed for image classification, have been shown to achieve promising performances in several action recognition datasets =-=[37, 21, 12]-=-. Recent studies show that combining multiple types of local descriptors can improve recognition performance e.g. [33, 12]. Combining methods can be roughly grouped into three types, namely descriptor... |
25 |
Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening
- Jégou, Chum
- 2012
(Show Context)
Citation Context ...ion Boundary Histogram (MBH). As in [24], we separately apply PCA on each type of descriptor to reduce their dimensions by a factor of two. We also whiten each types of descriptors as is suggested in =-=[13]-=-. The resulting descriptors are then L2-normalized. A total of 256,000 descriptors are randomly sampled from training sets. The VLFeat Toolbox [32] is used to implement baseline methods. In particular... |
21 | Motionlets: mid-level 3D parts for human motion recognition
- Wang, Qiao, et al.
(Show Context)
Citation Context ...ts of MVSV on HMDB51. LF refers to the latent factors Z . G-HOG and G-MBH refer to the gradient vectors GHOG and GMBH, respectively. HMDB51 UCF101 STIP+BoVW [16] 23.0% STIP+BoVW [28] 43.9% Motionlets =-=[35]-=- 42.1% DT+VLAD 79.9% DT+BoVW [33] 46.6% DT+FV 81.4% w-traj+VLAD [12] 52.1% DT+FV+SPM [21] 54.8% MVSV 55.9% MVSV 83.5% Table 3. Comparison of MVSV to the state-of-the-art methods. tiple types of descri... |
17 | Revisiting the VLAD image representation
- Delhumeau, Gosselin, et al.
(Show Context)
Citation Context ...linear transformation on each aggregated vectors of v in the k-th submodel. Similar transformation based on PCA has been shown to be an effective way to improve image retrieval performance using VLAD =-=[7]-=-. Our (a) HMDB51 (b) UCF101 Figure 4. Example frames from the HMDB51 and UCF101 datasets. latent vectorZ can also be extracted via a simplified version of M-PCCA. k-means is used to learn the local ce... |
17 | Factorized orthogonal latent spaces.
- Salzmann
- 2010
(Show Context)
Citation Context ...ointly generated by the mixture of K submodels. Partly inspired by the Gaussian mixture model (GMM) based Fisher Vector representation [23] and the Factorized Orthogonal Latent Spaces (FOLS) approach =-=[25]-=- for multiview learning, in this paper, we propose a Mixture model of Probabilistic Canonical Correlation Analyzers (M-PCCA), and utilize this model to jointly encode multiple types of descriptors for... |
16 | Mining motion atoms and phrases for complex action recognition - Wang, Qiao, et al. - 2013 |
14 | Robust late fusion with rank minimization.
- Ye, Liu, et al.
- 2012
(Show Context)
Citation Context ...icated kernel-level fusion methods when only limited kernels are considered. The last fusion method is score-level fusion, which trains classifiers for each descriptor and fuses the confidence scores =-=[39, 30, 38]-=-. All these methods have been extensively evaluated in the context of complex event detection [29, 20]. Descriptor-level fusion and kernel average are widely applied in action recognition [12, 33]. Wh... |
8 |
Exploring motion boundary based sampling and spatial-temporal context descriptors for action recognition
- Peng, Qiao, et al.
- 2013
(Show Context)
Citation Context ...ed models for generating video representations [23, 14]. To enhance recognition performance, several kinds of descriptors have been proposed, each of which describes certain aspects of object feature =-=[5, 19, 33, 22]-=-. In particular, HOG descriptor characterizes static appearance [5], while HOF and MBH descriptors capture dynamic motion [19, 6]. So far, local spatio-temporal descriptors [8] have exhibited successf... |
8 | Combining the right features for complex event recognition
- Tang, Yao, et al.
- 2013
(Show Context)
Citation Context ...icated kernel-level fusion methods when only limited kernels are considered. The last fusion method is score-level fusion, which trains classifiers for each descriptor and fuses the confidence scores =-=[39, 30, 38]-=-. All these methods have been extensively evaluated in the context of complex event detection [29, 20]. Descriptor-level fusion and kernel average are widely applied in action recognition [12, 33]. Wh... |
8 | Latent hierarchical model of temporal structure for complex activity classification
- Wang, Qiao, et al.
(Show Context)
Citation Context ... results, and outperforms FV and VLAD with descriptor concatenation or kernel average fusion strategy. 1. Introduction Action recognition has been an active research area due to its wide applications =-=[1, 33, 34, 36]-=-. Early research focus had been on datasets with limited size and relatively controlled settings, such as the KTH dataset [26], but later shifted to large and more realistic datasets such as the HMDB5... |
4 | A.: Multiple kernel learning for visual object recognition: A review
- Bucak, Jin, et al.
(Show Context)
Citation Context ...tative kernel-level fusion method. It is equivalent to directly concatenating the global representations corresponding to each type of descriptor, and fed the final concatenation into the linear SVM. =-=[4]-=- reported that kernel average is particularly effective compared to more sophisticated kernel-level fusion methods when only limited kernels are considered. The last fusion method is score-level fusio... |
4 | Feature weighting via optimal thresholding for video analysis.
- Xu, Yang, et al.
- 2013
(Show Context)
Citation Context ...icated kernel-level fusion methods when only limited kernels are considered. The last fusion method is score-level fusion, which trains classifiers for each descriptor and fuses the confidence scores =-=[39, 30, 38]-=-. All these methods have been extensively evaluated in the context of complex event detection [29, 20]. Descriptor-level fusion and kernel average are widely applied in action recognition [12, 33]. Wh... |
1 |
et al. The em algorithm for mixtures of factor analyzers
- Ghahramani, Hinton
- 1996
(Show Context)
Citation Context ...of mixture model based methods has been demonstrated in various models such as Mixture of Probabilistic Principal Component Analysis (M-PPCA) [31], mixture of experts [15], mixture of Factor analysis =-=[9]-=- etc. In the following section, we present our Mixture model of Probabilistic Canonical Correlation Analyzer (M-PCCA) and its corresponding learning algorithm. 3.1. Model formulation Consider a mixtur... |