Results 11 - 20
of
414
Event detection in crowded videos
- In IEEE International Conference on Computer Vision
, 2007
"... Real-world actions occur often in crowded, dynamic environments. This poses a difficult challenge for current approaches to video event detection because it is difficult to segment the actor from the background due to distracting motion from other objects in the scene. We propose a technique for eve ..."
Abstract
-
Cited by 151 (11 self)
- Add to MetaCart
Real-world actions occur often in crowded, dynamic environments. This poses a difficult challenge for current approaches to video event detection because it is difficult to segment the actor from the background due to distracting motion from other objects in the scene. We propose a technique for event recognition in crowded videos that reliably identifies actions in the presence of partial occlusion and background clutter. Our approach is based on three key ideas: (1) we efficiently match the volumetric representation of an event against oversegmented spatio-temporal video volumes; (2) we augment our shape-based features using flow; (3) rather than treating an event template as an atomic entity, we separately match by parts (both in space and time), enabling robustness against occlusions and actor variability. Our experiments on human actions, such as picking up a dropped object or waving in a crowd show reliable detection with few false positives. 1.
Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions
- in In IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2009
"... System theoretic approaches to action recognition model the dynamics of a scene with linear dynamical systems (LDSs) and perform classification using metrics on the space of LDSs, e.g. Binet-Cauchy kernels. However, such approaches are only applicable to time series data living in a Euclidean space, ..."
Abstract
-
Cited by 107 (6 self)
- Add to MetaCart
(Show Context)
System theoretic approaches to action recognition model the dynamics of a scene with linear dynamical systems (LDSs) and perform classification using metrics on the space of LDSs, e.g. Binet-Cauchy kernels. However, such approaches are only applicable to time series data living in a Euclidean space, e.g. joint trajectories extracted from motion capture data or feature point trajectories extracted from video. Much of the success of recent object recognition techniques relies on the use of more complex feature descriptors, such as SIFT descriptors or HOG descriptors, which are essentially histograms. Since histograms live in a non-Euclidean space, we can no longer model their temporal evolution with LDSs, nor can we classify them using a metric for LDSs. In this paper, we propose to represent each frame of a video using a histogram of oriented optical flow (HOOF) and to recognize human actions by classifying HOOF time-series. For this purpose, we propose a generalization of the Binet-Cauchy kernels to nonlinear dynamical systems (NLDS) whose output lives in a non-Euclidean space, e.g. the space of histograms. This can be achieved by using kernels defined on the original non-Euclidean space, leading to a well-defined metric for NLDSs. We use these kernels for the classification of actions in video sequences using (HOOF) as the output of the NLDS. We evaluate our approach to recognition of human actions in several scenarios and achieve encouraging results. 1.
Recognizing human actions by attributes
- University of Michigan
, 2011
"... In this paper we explore the idea of using high-level se-mantic concepts, also called attributes, to represent human actions from videos and argue that attributes enable the construction of more descriptive models for human action recognition. We propose a unified framework wherein man-ually specifi ..."
Abstract
-
Cited by 96 (3 self)
- Add to MetaCart
(Show Context)
In this paper we explore the idea of using high-level se-mantic concepts, also called attributes, to represent human actions from videos and argue that attributes enable the construction of more descriptive models for human action recognition. We propose a unified framework wherein man-ually specified attributes are: i) selected in a discriminative fashion so as to account for intra-class variability; ii) co-herently integrated with data-driven attributes to make the attribute set more descriptive. Data-driven attributes are automatically inferred from the training data using an in-formation theoretic approach. Our framework is built upon a latent SVM formulation where latent variables capture the degree of importance of each attribute for each action class. We also demonstrate that our attribute-based action representation can be effectively used to design a recogni-tion procedure for classifying novel action classes for which no training samples are available. We test our approach on several publicly available datasets and obtain promising re-sults that quantitatively demonstrate our theoretical claims. 1.
Automatic annotation of everyday movements
, 2003
"... This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) track ..."
Abstract
-
Cited by 88 (5 self)
- Add to MetaCart
(Show Context)
This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) tracking people in 2D and then, using an annotated motion capture dataset, (2) synthesizing an annotated 3D motion sequence matching the 2D tracks. The motion capture data is manually annotated using a class structure that describes everyday motions and allows motion annotations to be composed — one may jump while running, for example. Descriptions computed from video of real motions show that the method is accurate. 1.
Articulated soft objects for video-based body modeling
- In ICCV
, 2001
"... We develop a framework for 3–D shape and motion recovery of articulated deformable objects. We propose a formalism that incorporates the use of implicit surfaces into earlier robotics approaches that were designed to handle articulated structures. We demonstrate its effectiveness for human body mode ..."
Abstract
-
Cited by 79 (10 self)
- Add to MetaCart
(Show Context)
We develop a framework for 3–D shape and motion recovery of articulated deformable objects. We propose a formalism that incorporates the use of implicit surfaces into earlier robotics approaches that were designed to handle articulated structures. We demonstrate its effectiveness for human body modeling from video sequences. Our method is both robust and generic. It could easily be applied to other shape and motion recovery problems. 1.
Motion Segmentation and Pose Recognition with Motion History Gradients
- Machine Vision and Applications
, 2000
"... This paper uses a simple method for representing motion in successively layered silhouettes that directly encode system time termed the timed Motion History Image (tMHI). This representation can be used to both (a) determine the current pose of the object and to (b) segment and measure the motions i ..."
Abstract
-
Cited by 76 (5 self)
- Add to MetaCart
(Show Context)
This paper uses a simple method for representing motion in successively layered silhouettes that directly encode system time termed the timed Motion History Image (tMHI). This representation can be used to both (a) determine the current pose of the object and to (b) segment and measure the motions induced by the object in a video scene. These segmented regions are not “motion blobs”, but instead motion regions naturally connected to the moving parts of the object of interest. This method may be used as a very general gesture recognition “toolbox”. We use it to recognize waving and overhead clapping motions to control a music synthesis program. 1. Introduction and Related
Model-Based 3D Tracking of an Articulated Hand
"... This paper presents a practical technique for modelbased 3D hand tracking. An anatomically accurate hand model is built from truncated quadrics. This allows for the generation of 2D profiles of the model using elegant tools from projective geometry, and for an efficient method to handle self-occlusi ..."
Abstract
-
Cited by 70 (0 self)
- Add to MetaCart
This paper presents a practical technique for modelbased 3D hand tracking. An anatomically accurate hand model is built from truncated quadrics. This allows for the generation of 2D profiles of the model using elegant tools from projective geometry, and for an efficient method to handle self-occlusion. The pose of the hand model is estimated with an Unscented Kalman filter (UKF), which minimizes the geometric error between the profiles and edges extracted from the images. The use of the UKF permits higher frame rates than more sophisticated estimation methods such as particle filtering, whilst providing higher accuracy than the extended Kalman filter. The system is easily scalable from single to multiple views, and from rigid to articulated models. First experiments on real data using one and two cameras demonstrate the quality of the proposed method for tracking a 7 DOF hand model.
Motion-based Recognition of People in EigenGait Space
, 2002
"... A motion-based, correspondence-free technique for human gait recognition in monocular video is presented. We contend that the planar dynamics of a walking person are encoded in a 2D plot consisting of the pairwise image similarities of the sequence of images of the person, and that gait recognition ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
A motion-based, correspondence-free technique for human gait recognition in monocular video is presented. We contend that the planar dynamics of a walking person are encoded in a 2D plot consisting of the pairwise image similarities of the sequence of images of the person, and that gait recognition can be achieved via standard pattern classification of these plots. We use background modelling to track the person for a number of frames and extract a sequence of segmented images of the person. The self-similarity plot is computed via correlation of each pair of images in this sequence. For recognition, the method applies Principal Component Analysis to reduce the dimensionality of the plots, then uses the k-nearest neighbor rule in this reduced space to classify an unknown person. This method is robust to tracking and segmentation errors, and to variation in clothing and background. It is also invariant to small changes in camera viewpoint and walking speed. The method is tested on outdoor sequences of 44 people with 4 sequences of each taken on two different days, and achieves a classification rate of 77%. It is also tested on indoor sequences of 7 people walking on a treadmill, taken from 8 different viewpoints and on 7 different days. A classification rate of 78% is obtained for near-fronto-parallel views, and 65% on average over all view.
Articulated Soft Objects for Multi-View Shape and Motion Capture
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... We develop a framework for 3–D shape and motion recovery of articulated deformable objects. We propose a formalism that incorporates the use of implicit surfaces into earlier robotics approaches that were designed to handle articulated structures. We demonstrate its effectiveness for human body mode ..."
Abstract
-
Cited by 63 (8 self)
- Add to MetaCart
(Show Context)
We develop a framework for 3–D shape and motion recovery of articulated deformable objects. We propose a formalism that incorporates the use of implicit surfaces into earlier robotics approaches that were designed to handle articulated structures. We demonstrate its effectiveness for human body modeling from synchronized video sequences. Our method is both robust and generic. It could easily be applied to other shape and motion recovery problems. 1
A hierarchical bayesian network for event recognition of human actions and interactions
- Association For Computing Machinery Multimedia Systems Journal
, 2004
"... Recognizing human interactions is a challenging task due to the multiple body parts of interacting persons and the concomitant occlusions. This paper presents a method for the recognition of twoperson interactions using a hierarchical Bayesian network (BN). The poses of simultaneously tracked body p ..."
Abstract
-
Cited by 61 (13 self)
- Add to MetaCart
(Show Context)
Recognizing human interactions is a challenging task due to the multiple body parts of interacting persons and the concomitant occlusions. This paper presents a method for the recognition of twoperson interactions using a hierarchical Bayesian network (BN). The poses of simultaneously tracked body parts are estimated at the low level of the BN, and the overall body pose is estimated at the high level of the BN. The evolution of the poses of the multiple body parts are processed by a dynamic Bayesian network (DBN). The recognition of two-person interactions are expressed in terms of semantic verbal descriptions at multiple levels; individual bodypart motions at low level, single-person actions at middle level, and two-person interactions at high level. Example sequences of interacting persons illustrate the success of the proposed framework. Keywords surveillance, event recognition, scene understanding, human interaction,