Results 1 - 10
of
18
Unsupervised learning of human action categories using spatial-temporal words
- In Proc. BMVC
, 2006
"... Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences ..."
Abstract
-
Cited by 161 (4 self)
- Add to MetaCart
Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences is very useful for a variety of tasks, such as video surveillance, objectlevel video summarization, video indexing, digital library organization, etc. However, it remains a challenging task for computers to achieve robust action recognition due to cluttered background, camera motion, occlusion, and geometric and photometric variances of objects. For example, in a live video of a skating competition, the skater moves rapidly across the rink, and the camera also moves to follow the skater. With moving camera, non-stationary background, and moving target, few vision algorithms could identify, categorize and
A spatio-temporal descriptor based on 3d-gradients
- In BMVC’08
"... In this work, we present a novel local descriptor for video sequences. The proposed descriptor is based on histograms of oriented 3D spatio-temporal gradients. Our contribution is four-fold. (i) To compute 3D gradients for arbitrary scales, we develop a memory-efficient algorithm based on integral v ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
In this work, we present a novel local descriptor for video sequences. The proposed descriptor is based on histograms of oriented 3D spatio-temporal gradients. Our contribution is four-fold. (i) To compute 3D gradients for arbitrary scales, we develop a memory-efficient algorithm based on integral videos. (ii) We propose a generic 3D orientation quantization which is based on regular polyhedrons. (iii) We perform an in-depth evaluation of all descriptor parameters and optimize them for action recognition. (iv) We apply our descriptor to various action datasets (KTH, Weizmann, Hollywood) and show that we outperform the state-of-the-art. 1
Evaluation of local spatio-temporal features for action recognition
- University of Central Florida, U.S.A
, 2009
"... Local space-time features have recently become a popular video representation for action recognition. Several methods for feature localization and description have been proposed in the literature and promising recognition results were demonstrated for a number of action classes. The comparison of ex ..."
Abstract
-
Cited by 34 (7 self)
- Add to MetaCart
Local space-time features have recently become a popular video representation for action recognition. Several methods for feature localization and description have been proposed in the literature and promising recognition results were demonstrated for a number of action classes. The comparison of existing methods, however, is often limited given the different experimental settings used. The purpose of this paper is to evaluate and compare previously proposed space-time features in a common experimental setup. In particular, we consider four different feature detectors and six local feature descriptors and use a standard bag-of-features SVM approach for action recognition. We investigate the performance of these methods on a total of 25 action classes distributed over three datasets with varying difficulty. Among interesting conclusions, we demonstrate that regular sampling of space-time features consistently outperforms all tested space-time interest point detectors for human actions in realistic settings. We also demonstrate a consistent ranking for the majority of methods over different datasets and discuss their advantages and limitations. 1
Object recognition using composed receptive field histograms of higher dimensionality
- in Proc. ICPR’04
"... Recent work has shown that effective methods for recognising objects or spatio-temporal events can be constructed based on receptive field responses summarised into histograms or other histogram-like image descriptors. This paper presents a set of composed histogram features of higher dimensionality ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Recent work has shown that effective methods for recognising objects or spatio-temporal events can be constructed based on receptive field responses summarised into histograms or other histogram-like image descriptors. This paper presents a set of composed histogram features of higher dimensionality, which give significantly better recognition performance compared to the histogram descriptors of lower dimensionality that were used in the original papers by Swain & Ballard (1991) or Schiele & Crowley (2000). The use of histograms of higher dimensionality is made possible by a sparse representation for efficient computation and handling of higher-dimensional histograms. Results of extensive experiments are reported, showing how the performance of histogram-based recognition schemes depend upon different combinations of cues, in terms of Gaussian derivatives or differential invariants applied to either intensity information, chromatic information or both. It is shown that there exist composed higher-dimensional histogram descriptors with much better performance for recognising known objects than previously used histogram features. Experiments are also reported of classifying unknown objects into visual categories. 1
Periodic motion detection and segmentation via approximate sequence alignment
- In Proceedings of the International Conference on Computer Vision
, 2005
"... A method for detecting and segmenting periodic motion is presented. We exploit periodicity as a cue and detect periodic motion in complex scenes where common methods for motion segmentation are likely to fail. We note that periodic motion detection can be seen as an approximate case of sequence alig ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
A method for detecting and segmenting periodic motion is presented. We exploit periodicity as a cue and detect periodic motion in complex scenes where common methods for motion segmentation are likely to fail. We note that periodic motion detection can be seen as an approximate case of sequence alignment where an image sequence is matched to itself over one or more periods of time. To use this observation, we first consider alignment of two video sequences obtained by independently moving cameras. Under assumption of constant translation, the fundamental matrices and the homographies are shown to be time-linear matrix functions. These dynamic quantities can be estimated by matching corresponding space-time points with similar local motion and shape. For periodic motion, we match corresponding points across periods and develop a RANSAC procedure to simultaneously estimate the period and the dynamic geometric transformations between periodic views. Using this method, we demonstrate detection and segmentation of human periodic motion in complex scenes with non-rigid backgrounds, moving camera and motion parallax. 1.
Local velocityadapted motion events for spatio-temporal recognition
- CVIU
, 2007
"... In this paper we address the problem in motion recognition using event-based local motion representations. We assume that similar patterns of motion contain similar events with consistent motion across image sequences. Using this assumption, we formulate the problem of motion recognition as a matchi ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
In this paper we address the problem in motion recognition using event-based local motion representations. We assume that similar patterns of motion contain similar events with consistent motion across image sequences. Using this assumption, we formulate the problem of motion recognition as a matching of corresponding events in image sequences. To enable the matching, we present and evaluate a set of motion descriptors exploiting the spatial and the temporal coherence of motion measurements between corresponding events in image sequences. As motion measurements may depend on the relative motion of the camera, we also present a mechanism for local velocity adaptation of events and evaluate its influence when recognizing image sequences subjected to different camera motions. When recognizing motion, we compare the performance of nearest neighbor (NN) classifier with the performance of support vector machine (SVM). We also compare event-based motion representations to motion representations by global histograms. An experimental evaluation on a large video database with human actions demonstrates the advantage of the proposed scheme for event-based motion representation in combination with SVM classification. The particular advantage of event-based representations and velocity adaptation is further emphasized when recognizing human actions in unconstrained scenes with complex and non-stationary backgrounds.
Efficient Human Action Recognition by Cascaded Linear Classification
"... We present a human action recognition system suitable for very short sequences. In particular, we estimate Histograms of Oriented Gradients (HOGs) for the current frame as well as the corresponding dense flow field estimated from two frames. The thus obtained descriptors are then efficiently represe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a human action recognition system suitable for very short sequences. In particular, we estimate Histograms of Oriented Gradients (HOGs) for the current frame as well as the corresponding dense flow field estimated from two frames. The thus obtained descriptors are then efficiently represented by the coefficients of a Nonnegative Matrix Factorization (NMF). To further speed up the overall process, we apply an efficient cascaded Linear Discriminant Analysis (CLDA) classifier. In the experimental results we show the benefits of the proposed approach on standard benchmark datasets as well as on more challenging and realistic videos. In addition, since other stateof-the-art methods apply weighting between different cues, we provide a detailed analysis of the importance of weighting for action recognition and show that weighting is not necessarily required for the given task. 1.
(2009)" Evaluation of local spatio-temporal features
, 2011
"... Local space-time features have recently become a popular video representation for action recognition. Several methods for feature localization and description have been proposed in the literature and promising recognition results were demonstrated for a number of action classes. The comparison of ex ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Local space-time features have recently become a popular video representation for action recognition. Several methods for feature localization and description have been proposed in the literature and promising recognition results were demonstrated for a number of action classes. The comparison of existing methods, however, is often limited given the different experimental settings used. The purpose of this paper is to evaluate and compare previously proposed space-time features in a common experimental setup. In particular, we consider four different feature detectors and six local feature descriptors and use a standard bag-of-features SVM approach for action recognition. We investigate the performance of these methods on a total of 25 action classes distributed over three datasets with varying difficulty. Among interesting conclusions, we demonstrate that regular sampling of space-time features consistently outperforms all tested space-time interest point detectors for human actions in realistic settings. We also demonstrate a consistent ranking for the majority of methods over different datasets and discuss their advantages and limitations. 1
Using Space-Time Interest Points for Video Sequence Synchronization
"... We introduce an algorithm for synchronizing two video sequences recorded by stationary cameras. It extends common RANSAC-based approaches that recover either a homography or a fundamental matrix from putatively matched spatial features in two images. In our algorithm, we detect space-time interest p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We introduce an algorithm for synchronizing two video sequences recorded by stationary cameras. It extends common RANSAC-based approaches that recover either a homography or a fundamental matrix from putatively matched spatial features in two images. In our algorithm, we detect space-time interest points in each sequence which represent events such as objects changing direction, and putatively matching points from each sequence are determined. A nested RANSAC framework on these putative matches is then used to firstly recover the frame offset and ratio of frame rates of the two sequences, then either a homography or a fundamental matrix relating the two views, depending on the type of motion contained within the sequences. No camera calibration or object tracking is required. Real sequences containing motion either on a plane or in free space are synchronized and it is demonstrated that this approach is successful in recovering the ratio of frame rates, the frame offset, and the homography or fundamental matrix relating the two sequences. 1
Identifying Surprising Events in Videos Using Bayesian Topic Models ⋆
"... Abstract. Automatic processing of video data is essential in order to allow efficient access to large amounts of video content, a crucial point in such applications as video mining and surveillance. In this paper we focus on the problem of identifying interesting parts of the video. Specifically, we ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Automatic processing of video data is essential in order to allow efficient access to large amounts of video content, a crucial point in such applications as video mining and surveillance. In this paper we focus on the problem of identifying interesting parts of the video. Specifically, we seek to identify atypical video events, which are the events a human user is usually looking for. To this end we employ the notion of Bayesian surprise, as defined in [1, 2], in which an event is considered surprising if its occurrence leads to a large change in the probability of the world model. We propose to compute this abstract measure of surprise by first modeling a corpus of video events using the Latent Dirichlet Allocation model. Subsequently, we measure the change in the Dirichlet prior of the LDA model as a result of each video event’s occurrence. This change of the Dirichlet prior leads to a closed form expression for an event’s level of surprise, which can then be inferred directly from the observed data. We tested our algorithm on a real dataset of video data, taken by a camera observing an urban street intersection. The results demonstrate our ability to detect atypical events, such as a car making a U-turn or a person crossing an intersection diagonally.

