• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Recognizing human actions: A local svm approach. ICPR (2004)

by C Schuldt, I Laptev, B Caputo
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 758
Next 10 →

Space-time Interest Points

by Ivan Laptev, Tony Lindeberg - IN ICCV , 2003
"... Local image features or interest points provide compact and abstract representations of patterns in an image. In this paper, we propose to extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be use ..."
Abstract - Cited by 819 (21 self) - Add to MetaCart
Local image features or interest points provide compact and abstract representations of patterns in an image. In this paper, we propose to extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be used for a compact representation of video data as well as for its interpretation.. To detect

Learning realistic human actions from movies

by Ivan Laptev, Marcin Marszałek, Cordelia Schmid, Benjamin Rozenfeld - IN: CVPR. , 2008
"... The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribut ..."
Abstract - Cited by 738 (48 self) - Add to MetaCart
The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multichannel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8 % accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.
(Show Context)

Citation Context

... from this progress and aim to transfer previous experience to the domain of video recognition and the recognition of human actions in particular. Existing datasets for human action recognition (e.g. =-=[15]-=-, see figure 8) provide samples for only a few action classes recorded in controlled and simplified settings. This stands in sharp contrast with the demands of real applications focused on natural vid...

Behavior recognition via sparse spatio-temporal features

by Piotr Dollár, Vincent Rabaud, Garrison Cottrell, Serge Belongie - In VS-PETS , 2005
"... A common trend in object recognition is to detect and leverage the use of sparse, informative feature points. The use of such features makes the problem more manageable while providing increased robustness to noise and pose variation. In this work we develop an extension of these ideas to the spatio ..."
Abstract - Cited by 717 (4 self) - Add to MetaCart
A common trend in object recognition is to detect and leverage the use of sparse, informative feature points. The use of such features makes the problem more manageable while providing increased robustness to noise and pose variation. In this work we develop an extension of these ideas to the spatio-temporal case. For this purpose, we show that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and we propose an alternative. Anchoring off of these interest points, we devise a recognition algorithm based on spatio-temporally windowed data. We present recognition results on a variety of datasets including both human and rodent behavior. 1.
(Show Context)

Citation Context

...roach shows promise for coarse video indexing of highly visually distinct actions. We examine the approaches of [30] and [5] in more detail in section 4.3. Most closely related to our work is that of =-=[26]-=-, who also use sparsely detected spatio-temporal features for recognition, building on the work on spatio-temporal feature detectors by [17]. They show promising results in human behavior recognition,...

Unsupervised learning of human action categories using spatial-temporal words

by Juan Carlos Niebles, Hongcheng Wang, Li Fei-fei - In Proc. BMVC , 2006
"... Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences ..."
Abstract - Cited by 494 (8 self) - Add to MetaCart
Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences is very useful for a variety of tasks, such as video surveillance, objectlevel video summarization, video indexing, digital library organization, etc. However, it remains a challenging task for computers to achieve robust action recognition due to cluttered background, camera motion, occlusion, and geometric and photometric variances of objects. For example, in a live video of a skating competition, the skater moves rapidly across the rink, and the camera also moves to follow the skater. With moving camera, non-stationary background, and moving target, few vision algorithms could identify, categorize and
(Show Context)

Citation Context

...in space-time where the image values have significant local variations in both dimensions. The representation has been successfully applied to human action recognition combined with an SVM classifier =-=[17]-=-. Dollár et al. [7] propose an alternative approach to detect sparse space-time interest points based on separable linear filters for behavior recognition. Local space-time patches, therefore, have be...

Action Recognition by Dense Trajectories

by Heng Wang , Alexander Kläser , Cordelia Schmid , Liu Cheng-lin , 2011
"... Feature trajectories have shown to be efficient for rep-resenting videos. Typically, they are extracted using the KLT tracker or matching SIFT descriptors between frames. However, the quality as well as quantity of these trajecto-ries is often not sufficient. Inspired by the recent success of dense ..."
Abstract - Cited by 293 (16 self) - Add to MetaCart
Feature trajectories have shown to be efficient for rep-resenting videos. Typically, they are extracted using the KLT tracker or matching SIFT descriptors between frames. However, the quality as well as quantity of these trajecto-ries is often not sufficient. Inspired by the recent success of dense sampling in image classification, we propose an approach to describe videos by dense trajectories. We sam-ple dense points from each frame and track them based on displacement information from a dense optical flow field. Given a state-of-the-art optical flow algorithm, our trajec-tories are robust to fast irregular motions as well as shot boundaries. Additionally, dense trajectories cover the mo-tion information in videos well. We, also, investigate how to design descriptors to encode the trajectory information. We introduce a novel descriptor based on motion boundary histograms, which is robust to camera motion. This descriptor consistently outperforms other state-of-the-art descriptors, in particular in uncon-trolled realistic videos. We evaluate our video description in the context of action classification with a bag-of-features approach. Experimental results show a significant improve-ment over the state of the art on four datasets of varying difficulty, i.e. KTH, YouTube, Hollywood2 and UCF sports.

Evaluation of local spatio-temporal features for action recognition

by Heng Wang, Muhammad Muneeb Ullah, Alexander Kläser, Ivan Laptev, Cordelia Schmid - University of Central Florida, U.S.A , 2009
"... Local space-time features have recently become a popular video representation for action recognition. Several methods for feature localization and description have been proposed in the literature and promising recognition results were demonstrated for a number of action classes. The comparison of ex ..."
Abstract - Cited by 274 (25 self) - Add to MetaCart
Local space-time features have recently become a popular video representation for action recognition. Several methods for feature localization and description have been proposed in the literature and promising recognition results were demonstrated for a number of action classes. The comparison of existing methods, however, is often limited given the different experimental settings used. The purpose of this paper is to evaluate and compare previously proposed space-time features in a common experimental setup. In particular, we consider four different feature detectors and six local feature descriptors and use a standard bag-of-features SVM approach for action recognition. We investigate the performance of these methods on a total of 25 action classes distributed over three datasets with varying difficulty. Among interesting conclusions, we demonstrate that regular sampling of space-time features consistently outperforms all tested space-time interest point detectors for human actions in realistic settings. We also demonstrate a consistent ranking for the majority of methods over different datasets and discuss their advantages and limitations. 1
(Show Context)

Citation Context

...s and limitations. 1 Introduction Local image and video features have been shown successful for many recognition tasks such as object and scene recognition [8, 17] as well as human action recognition =-=[16, 24]-=-. Local space-time features capture characteristic shape and motion in video and provide relatively independent representation of events with respect to their spatio-temporal shifts and scales as well...

A biologically inspired system for action recognition

by H. Jhuang, T. Serre, L. Wolf, T. Poggio - In ICCV , 2007
"... We present a biologically-motivated system for the recognition of actions from video sequences. The approach builds on recent work on object recognition based on hierarchical feedforward architectures [25, 16, 20] and extends a neurobiological model of motion processing in the visual cortex [10]. Th ..."
Abstract - Cited by 238 (15 self) - Add to MetaCart
We present a biologically-motivated system for the recognition of actions from video sequences. The approach builds on recent work on object recognition based on hierarchical feedforward architectures [25, 16, 20] and extends a neurobiological model of motion processing in the visual cortex [10]. The system consists of a hierarchy of spatio-temporal feature detectors of increasing complexity: an input sequence is first analyzed by an array of motiondirection sensitive units which, through a hierarchy of processing stages, lead to position-invariant spatio-temporal feature detectors. We experiment with different types of motion-direction sensitive units as well as different system architectures. As in [16], we find that sparse features in intermediate stages outperform dense ones and that using a simple feature selection approach leads to an efficient system that performs better with far fewer features. We test the approach on different publicly available action datasets, in all cases achieving the highest results reported to date. 1.
(Show Context)

Citation Context

...5]. The other common class of approaches is based on the processing of spatio-temporal features, either global as in the case of low-resolution videos [33, 6, 2] or local for higher resolution images =-=[24, 5, 7, 18]-=-. Our approach falls in the second class of approaches to action recognition. It extends an earlier neurobiological model of motion processing in the dorsal stream of the visual cortex by Giese & Pogg...

A spatio-temporal descriptor based on 3d-gradients

by Alexander Kläser, Marcin Marszałek, Cordelia Schmid - In BMVC’08
"... In this work, we present a novel local descriptor for video sequences. The proposed descriptor is based on histograms of oriented 3D spatio-temporal gradients. Our contribution is four-fold. (i) To compute 3D gradients for arbitrary scales, we develop a memory-efficient algorithm based on integral v ..."
Abstract - Cited by 234 (6 self) - Add to MetaCart
In this work, we present a novel local descriptor for video sequences. The proposed descriptor is based on histograms of oriented 3D spatio-temporal gradients. Our contribution is four-fold. (i) To compute 3D gradients for arbitrary scales, we develop a memory-efficient algorithm based on integral videos. (ii) We propose a generic 3D orientation quantization which is based on regular polyhedrons. (iii) We perform an in-depth evaluation of all descriptor parameters and optimize them for action recognition. (iv) We apply our descriptor to various action datasets (KTH, Weizmann, Hollywood) and show that we outperform the state-of-the-art. 1
(Show Context)

Citation Context

...topped on convergence, when the maximum remains stable for three consecutive runs. 3.2 Datasets We compare our work to the state-of-the-art on three different action datasets. The KTH actions dataset =-=[16]-=- contains six types of different human action classes: walking, jogging, running, boxing, waving, and clapping (cf. fig. 2, top). Each action class is performed several times by 25 subjects. The seque...

Machine recognition of human activities: A survey

by Pavan Turaga, Rama Chellappa, V. S. Subrahmanian, Octavian Udrea , 2008
"... The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the a ..."
Abstract - Cited by 218 (0 self) - Add to MetaCart
The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing—robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing—make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) “actions ” and 2) “activities. ” “Actions ” are characterized by simple motion patterns typically executed by a single human. “Activities ” are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.
(Show Context)

Citation Context

... by extracting spatio–temporal interest points and clustering of the features. These interest points can be used in conjunction with machine learning approaches such as support vector machines (SVMs) =-=[47]-=- and graphical models [46]. Because the interest points are local in nature, longer term temporal correlations are ignored in these approaches. To address this issue, a method based on correlograms of...

Human Activity Analysis: A Review

by J. K. Aggarwal, M. S. Ryoo - TO APPEAR. ACM COMPUTING SURVEYS.
"... Human activity recognition is an important area of computer vision research. Its applications include surveillance systems, patient monitoring systems, and a variety of systems that involve interactions between persons and electronic devices such as human-computer interfaces. Most of these applicati ..."
Abstract - Cited by 214 (6 self) - Add to MetaCart
Human activity recognition is an important area of computer vision research. Its applications include surveillance systems, patient monitoring systems, and a variety of systems that involve interactions between persons and electronic devices such as human-computer interfaces. Most of these applications require an automated recognition of high-level activities, composed of multiple simple (or atomic) actions of persons. This paper provides a detailed overview of various state-of-the-art research papers on human activity recognition. We discuss both the methodologies developed for simple human actions and those for high-level activities. An approach-based taxonomy is chosen, comparing the advantages and limitations of each approach. Recognition methodologies for an analysis of simple actions of a single person are first presented in the paper. Space-time volume approaches and sequential approaches that represent and recognize activities directly from input images are discussed. Next, hierarchical recognition methodologies for high-level activities are presented and compared. Statistical approaches, syntactic approaches, and description-based approaches for hierarchical recognition are discussed in the paper. In addition, we further discuss the papers on the recognition of human-object interactions and group activities. Public datasets designed for the evaluation of the recognition methodologies are illustrated in our paper as well, comparing the methodologies’ performances. This review will provide the impetus for future research in more productive areas.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University