Results 1 -
7 of
7
Spatio-Temporal Object Detection Proposals
"... Abstract. Spatio-temporal detection of actions and events in video is a challeng-ing problem. Besides the difficulties related to recognition, a major challenge for detection in video is the size of the search space defined by spatio-temporal tubes formed by sequences of bounding boxes along the fra ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Spatio-temporal detection of actions and events in video is a challeng-ing problem. Besides the difficulties related to recognition, a major challenge for detection in video is the size of the search space defined by spatio-temporal tubes formed by sequences of bounding boxes along the frames. Recently methods that generate unsupervised detection proposals have proven to be very effective for object detection in still images. These methods open the possibility to use strong but computationally expensive features since only a relatively small number of detection hypotheses need to be assessed. In this paper we make two contribu-tions towards exploiting detection proposals for spatio-temporal detection prob-lems. First, we extend a recent 2D object proposal method, to produce spatio-temporal proposals by a randomized supervoxel merging process. We introduce spatial, temporal, and spatio-temporal pairwise supervoxel features that are used to guide the merging process. Second, we propose a new efficient supervoxel method. We experimentally evaluate our detection proposals, in combination with our new supervoxel method as well as existing ones. This evaluation shows that our supervoxels lead to more accurate proposals when compared to using existing state-of-the-art supervoxel methods. 1
Self-learning camera: Autonomous adaptation of object detectors to unlabeled video streams
- In ECCV
, 2014
"... Learning object detectors requires massive amounts of labeled training samples from the specific data source of interest. This is impractical when dealing with many different sources (e.g., in camera networks), or constantly changing ones such as mobile cameras (e.g., in robotics or driving assistan ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Learning object detectors requires massive amounts of labeled training samples from the specific data source of interest. This is impractical when dealing with many different sources (e.g., in camera networks), or constantly changing ones such as mobile cameras (e.g., in robotics or driving assistant systems). In this pa-per, we address the problem of self-learning detectors in an autonomous manner, i.e. (i) detectors continuously updating themselves to efficiently adapt to stream-ing data sources (contrary to transductive algorithms), (ii) without any labeled data strongly related to the target data stream (contrary to self-paced learning), and (iii) without manual intervention to set and update hyper-parameters. To that end, we propose an unsupervised, on-line, and self-tuning learning algorithm to optimize a multi-task learning convex objective. Our method uses confident but laconic or-acles (high-precision but low-recall off-the-shelf generic detectors), and exploits the structure of the problem to jointly learn on-line an ensemble of instance-level trackers, from which we derive an adapted category-level object detector. Our approach is validated on real-world publicly available video object datasets. 1
Action Detection with Improved Dense Trajectories and Sliding Window
"... Abstract. In this paper we describe an action/interaction detection sys-tem based on improved dense trajectories [20], multiple visual descriptors and bag-of-features representation. Given that the actions/interactions are not mutual exclusive, we train a binary classifier for every predefined actio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper we describe an action/interaction detection sys-tem based on improved dense trajectories [20], multiple visual descriptors and bag-of-features representation. Given that the actions/interactions are not mutual exclusive, we train a binary classifier for every predefined action/interaction. We rely on a non-overlapped temporal sliding win-dow to enable the temporal localization. We have tested our system in ChaLearn Looking at People Challenge 2014 Track 2 dataset[1, 2]. We obtained 0.4226 average overlap, which is the 3rd place in the track of the challenge. Finally, we provide an extensive analysis of the performance of this system on different actions and provide possible ways to improve a general action detection system.
5.3. FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem 6
"... Vision, perception and multimedia interpretation Table of contents ..."
(Show Context)
Action Localization in Videos through Context Walk
"... This paper presents an efficient approach for localizing actions by learning contextual relations, in the form of rel-ative locations between different video regions. We begin by over-segmenting the videos into supervoxels, which have the ability to preserve action boundaries and also reduce the com ..."
Abstract
- Add to MetaCart
(Show Context)
This paper presents an efficient approach for localizing actions by learning contextual relations, in the form of rel-ative locations between different video regions. We begin by over-segmenting the videos into supervoxels, which have the ability to preserve action boundaries and also reduce the complexity of the problem. Context relations are learned during training which capture displacements from all the supervoxels in a video to those belonging to foreground ac-tions. Then, given a testing video, we select a supervoxel randomly and use the context information acquired during training to estimate the probability of each supervoxel be-longing to the foreground action. The walk proceeds to a new supervoxel and the process is repeated for a few steps. This “context walk ” generates a conditional distribution of an action over all the supervoxels. A Conditional Ran-dom Field is then used to find action proposals in the video, whose confidences are obtained using SVMs. We validated the proposed approach on several datasets and show that context in the form of relative displacements between su-pervoxels can be extremely useful for action localization. This also results in significantly fewer evaluations of the classifier, in sharp contrast to the alternate sliding window approaches. 1.