• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A discriminative CNN video representation for event detection. In CVPR, (2015)

by Z Xu, Y Yang, A G Hauptmann
Add To MetaCart

Tools

Sorted by:
Results 1 - 3 of 3

Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection

by Xiaojun Chang, Yi Yang, Er G. Hauptmann, Eric P. Xing, Yao-liang Yu
"... We focus on detecting complex events in uncon-strained Internet videos. While most existing works rely on the abundance of labeled training data, we consider a more difficult zero-shot setting where no training data is supplied. We first pre-train a number of concept classifiers using data from othe ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
We focus on detecting complex events in uncon-strained Internet videos. While most existing works rely on the abundance of labeled training data, we consider a more difficult zero-shot setting where no training data is supplied. We first pre-train a number of concept classifiers using data from other sources. Then we evaluate the semantic correla-tion of each concept w.r.t. the event of interest. Af-ter further refinement to take prediction inaccuracy and discriminative power into account, we apply the discovered concept classifiers on all test videos and obtain multiple score vectors. These distinct score vectors are converted into pairwise compari-son matrices and the nuclear norm rank aggregation framework is adopted to seek consensus. To ad-dress the challenging optimization formulation, we propose an efficient, highly scalable algorithm that is an order of magnitude faster than existing alter-natives. Experiments on recent TRECVID datasets verify the superiority of the proposed approach. 1

Encoding Feature Maps of CNNs for Action Recognition

by Xiaojiang Peng , Cordelia Schmid Inria
"... Abstract We describe our approach for action classification in the THUMOS Challenge 2015. Our approach is based on two types of features, improved dense trajectories and CNN features. For trajectory features, we extract HOG, HOF, MBHx, and MBHy descriptors and apply Fisher vector encoding. For CNN ..."
Abstract - Add to MetaCart
Abstract We describe our approach for action classification in the THUMOS Challenge 2015. Our approach is based on two types of features, improved dense trajectories and CNN features. For trajectory features, we extract HOG, HOF, MBHx, and MBHy descriptors and apply Fisher vector encoding. For CNN features, we utilize a recent deep CNN model, VGG19, to capture appearance features and use VLAD encoding to encode/pool convolutional feature maps which shows better performance than average pooling of feature maps and full-connected activation features. After concatenating them, we train a linear SVM classifier for each class in a one-vs-all scheme.
(Show Context)

Citation Context

...pearance (HOG) and motion (HOF/MBH) descriptors. We rescale the videos to be at most 320 pixels wide, and skip every second frame to extract IDT features.We use a vocabulary of size 256 for GMM, and apply Fisher vector encoding separately for HOG, HOF, MBHx, and MBHy descriptors as [3][4]. We, then, normalize the resulting supervectors by power and intra normalization as suggested in [4], i.e., performing `2 normalization for each FV block independently after power normalization. 2.2. CNN feature maps based representation CNN features have become increasingly popular in action recognition [2] [9]. In [2], a video representation is obtained by average pooling of fc6 activation extracted for static frames every 10 frames. In [9], VLAD and Fisher vector are applied to fc6 activations and pool5 feature maps for event detection on TRECVID MED dataset. Following [9], we leverage VLAD and Conv5 feature maps for action recognition. We illustrate this idea in Figure 2. Considering a frame fi and the Conv5 layer of CNNs, we can view the filters of Conv5 layer as feature extractors, and the pixels of Conv5 feature maps as local features of corresponding patches in fi (see the pink squares in Fig...

Fisher Kernel Temporal Variation-based Relevance Feedback for Video Retrieval

by unknown authors
"... This paper proposes a novel framework for Relevance Feedback based on the Fisher Kernel (FK). Specifically, we train a Gaussian Mixture Model (GMM) on the top retrieval results (without supervision) and use this to create a FK representation, which is therefore specialized in modelling the most rele ..."
Abstract - Add to MetaCart
This paper proposes a novel framework for Relevance Feedback based on the Fisher Kernel (FK). Specifically, we train a Gaussian Mixture Model (GMM) on the top retrieval results (without supervision) and use this to create a FK representation, which is therefore specialized in modelling the most relevant examples. We use the FK representation to explicitly capture temporal variation in video via frame-based features taken at different time intervals. While the GMM is being trained, a user selects from the top examples those which he is looking for. This feedback is used to train a Support Vector Machine on the FK representation, which is then applied to re-rank the top retrieved results. We show that our approach outperforms other state-of-the-art relevance feedback methods. Experiments were carried out on the Blip10000, UCF50, UCF101 and ADL standard datasets using a broad range of multi-modal content descriptors (visual, audio, and text).
(Show Context)

Citation Context

...m, because the color naming histogram is designed as a perceptually based color naming metric that is more discriminative and compact; 17 • Convolutional Neural Network descriptors (4,096 dimensions) =-=[63, 64, 68]-=- - we use a set of Convolutional Neural Networks (CNN) features, using the protocol laid out from [63]. The employed CNNs were trained on either ImageNet 2010 or 2012 datasets, following as closely as...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University