Results 1 - 10
of
38
Monocular Pedestrian Detection: Survey and Experiments
, 2008
"... Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspective. The first ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspective. The first part of the paper consists of a survey. We cover the main components of a pedestrian detection system and the underlying models. The second (and larger) part of the paper contains a corresponding experimental study. We consider a diverse set of state-of-the-art systems: wavelet-based AdaBoost cascade [74], HOG/linSVM [11], NN/LRF [75] and combined shape-texture detection [23]. Experiments are performed on an extensive dataset captured on-board a vehicle driving through urban environment. The dataset includes many thousands of training samples as well as a 27 minute test sequence involving more than 20000 images with annotated pedestrian locations. We consider a generic evaluation setting and one specific to pedestrian detection on-board a vehicle. Results indicate a clear advantage of HOG/linSVM at higher image resolutions and lower processing speeds, and a superiority of the wavelet-based AdaBoost cascade approach at lower image resolutions and (near) real-time processing speeds. The dataset (8.5GB) is made public for benchmarking purposes.
Robust visual tracking for multiple targets
- IN ECCV
, 2006
"... We address the problem of robust multi-target tracking within the application of hockey player tracking. Although there has been extensive work in multi-target tracking, there is no existing visual tracking system that can automatically and robustly track a variable number of targets and correctly m ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
We address the problem of robust multi-target tracking within the application of hockey player tracking. Although there has been extensive work in multi-target tracking, there is no existing visual tracking system that can automatically and robustly track a variable number of targets and correctly maintain their identities with a monocular camera regardless of background clutter, camera motion and frequent mutual occlusion between targets. We build our system on the basis of the previous work by Okuma et al. [OTdF + 04]. The particle filter technique is adopted and modified to fit into the multi-target tracking framework. A rectification technique is employed to map the locations of players from the video frame coordinate system to the standard hockey rink coordinates so that the system can compensate for camera motion and the dynamics of players on the rink can be improved by a second order auto-regression model. A global nearest neighbor data association algorithm is introduced to assign boosting detections to the existing tracks for the proposal distribution in particle filters. The mean-shift algorithm is embedded into the particle filter framework to stabilize the trajectories of the targets for robust tracking during mutual occlusion. The color model of the targets is also improved by the kernel introduced by mean-shift. Experimental results show that our system is able to correctly track all the targets in the scene even if they are partially or completely occluded for a period of time.
Hidden markov models for optical flow analysis
- in crowds. International Conference on Pattern Recognition
, 2006
"... This paper presents an event detector for emergencies in crowds. Assuming a single camera and a dense crowd we rely on optical flow instead of tracking statistics as a feature to extract information from the crowd video data. The optical flow features are encoded with Hidden Markov Models to allow f ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This paper presents an event detector for emergencies in crowds. Assuming a single camera and a dense crowd we rely on optical flow instead of tracking statistics as a feature to extract information from the crowd video data. The optical flow features are encoded with Hidden Markov Models to allow for the detection of emergency or abnormal events in the crowd. In order to increase the detection sensitivity a local modelling approach is used. The results with simulated crowds show the effectiveness of the proposed approach on detecting abnormalities in dense crowds. 1
Human activity recognition from video: modeling, feature selection and classification architecture
- in International Workshop on Human Activity Recognition and Modeling (HAREM
, 2005
"... In this paper, we address the problem of recognizing human activities, such as {Active, Inactive, Walking, Running, Fighting} from video sequences, with a particular emphasis on the problems of feature selection, data modeling and classifier structure. The need for such systems is increasing everyda ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In this paper, we address the problem of recognizing human activities, such as {Active, Inactive, Walking, Running, Fighting} from video sequences, with a particular emphasis on the problems of feature selection, data modeling and classifier structure. The need for such systems is increasing everyday, with the number of (hundreds or thousands) of surveillance cameras deployed in public spaces. This massive number of cameras calls for systems able to detect, categorize and recognize human activity, requesting human attention only when necessary. Our work is focused on three fundamental issues: (i) the design of a classifier and data modeling for activity recognition; (ii) how to perform feature selection and (iii) how to define the structure of a classifier. We use of a Bayesian classifier, and model the likelihood functions as Gaussian mixtures, adequate to cope with complex data distributions, that are learned automatically. As for feature selection, we propose several (suboptimal) methods to evaluate the recognition rate achieved with different feature combinations, with the Bayesian classifier. Finally, we investigate the use of hierarchical classifiers (including the possibility of automatic generation). Our results were based on nearly 16,000 images of five activities and we achieved an error rate as low as 1.5%. These experiments clearly demonstrate the importance of powerful methodologies for data modeling and how intertwined feature selection, classifier design and the structure of the classifier are. 1
Context-aware visual tracking
- IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2009
"... Enormous uncertainties in unconstrained environments lead to a fundamental dilemma that many tracking algorithms have to face in practice: Tracking has to be computationally efficient, but verifying whether or not the tracker is following the true target tends to be demanding, especially when the ba ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Enormous uncertainties in unconstrained environments lead to a fundamental dilemma that many tracking algorithms have to face in practice: Tracking has to be computationally efficient, but verifying whether or not the tracker is following the true target tends to be demanding, especially when the background is cluttered and/or when occlusion occurs. Due to the lack of a good solution to this problem, many existing methods tend to be either effective but computationally intensive by using sophisticated image observation models or efficient but vulnerable to false alarms. This greatly challenges long-duration robust tracking. This paper presents a novel solution to this dilemma by considering the context of the tracking scene. Specifically, we integrate into the tracking process a set of auxiliary objects that are automatically discovered in the video on the fly by data mining. Auxiliary objects have three properties, at least in a short time interval: 1) persistent co-occurrence with the target, 2) consistent motion correlation to the target, and 3) easy to track. Regarding these auxiliary objects as the context of the target, the collaborative tracking of these auxiliary objects leads to efficient computation as well as strong verification. Our extensive experiments have exhibited exciting performance in very challenging realworld testing cases.
D.: Searching video for complex activities with finite state models
- In: IEEE Conf. on Computer Vision and Pattern Recognition
, 2007
"... We describe a method of representing human activities that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across space and across the body to produce ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We describe a method of representing human activities that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across space and across the body to produce complex queries. The presence of search units is inferred automatically by tracking the body, lifting the tracks to 3D and comparing to models trained using motion capture data. We show results for a large range of queries applied to a collection of complex motion and activity. Our models of short time scale limb behaviour are built using labelled motion capture set. We compare with discriminative methods applied to tracker data; our method offers significantly improved performance. We show experimental evidence that our method is robust to view direction and is unaffected by the changes of clothing. 1.
Tracking Multiple Occluding People by Localizing on Multiple Scene Planes
"... Abstract—Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multiview approach to solve this problem. In our approach, we neither detect nor track objects from any singl ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract—Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multiview approach to solve this problem. In our approach, we neither detect nor track objects from any single camera or camera pair; rather, evidence is gathered from all of the cameras into a synergistic framework and detection and tracking results are propagated back to each view. Unlike other multiview approaches that require fully calibrated views, our approach is purely image-based and uses only 2D constructs. To this end, we develop a planar homographic occupancy constraint that fuses foreground likelihood information from multiple views to resolve occlusions and localize people on a reference scene plane. For greater robustness, this process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Our fusion methodology also models scene clutter using the Schmieder and Weathersby clutter measure, which acts as a confidence prior, to assign higher fusion weight to views with lesser clutter. Detection and tracking are performed simultaneously by graph cuts segmentation of tracks in the space-time occupancy likelihood data. Experimental results with detailed qualitative and quantitative analysis are demonstrated in challenging multiview crowded scenes. Index Terms—Tracking, sensor fusion, graph-theoretic methods. Ç 1
A track-based human movement analysis and privacy protection system adaptive to environmental contexts
- In IEEE International Conference on Advanced Video and Signal based Surveillance
, 2005
"... This paper presents a track-based system for human movement analysis and privacy protection. Our system is adaptive to environmental contexts such as illumination variations, complex moving cast shadows, different camera perspectives, and diverse site scenarios. Most of outdoor surveillance systems ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
This paper presents a track-based system for human movement analysis and privacy protection. Our system is adaptive to environmental contexts such as illumination variations, complex moving cast shadows, different camera perspectives, and diverse site scenarios. Most of outdoor surveillance systems have been targeting at specific environmental situation: i.e., specific time, place, and activity scenarios. We address that more general human movement analysis systems should be able to handle multiple heterogeneous situations in an adaptive manner. We introduce the concept of ’spatio-temporal personal boundary ’ to represent different grouping patterns of human tracks, and we incorporate the concept with various site models. Experimental evaluations with extensive outdoor data show our system’s robustness to environmental changes and effectiveness to properly handle various environmental contexts. 1.
Structured Determinantal Point Processes
"... We present a novel probabilistic model for distributions over sets of structures— for example, sets of sequences, trees, or graphs. The critical characteristic of our model is a preference for diversity: sets containing dissimilar structures are more likely. Our model is a marriage of structured pro ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We present a novel probabilistic model for distributions over sets of structures— for example, sets of sequences, trees, or graphs. The critical characteristic of our model is a preference for diversity: sets containing dissimilar structures are more likely. Our model is a marriage of structured probabilistic models, like Markov random fields and context free grammars, with determinantal point processes, which arise in quantum physics as models of particles with repulsive interactions. We extend the determinantal point process model to handle an exponentially-sized set of particles (structures) via a natural factorization of the model into parts. We show how this factorization leads to tractable algorithms for exact inference, including computing marginals, computing conditional probabilities, and sampling. Our algorithms exploit a novel polynomially-sized dual representation of determinantal point processes, and use message passing over a special semiring to compute relevant quantities. We illustrate the advantages of the model on tracking and articulated pose estimation problems. 1
Monitoring Activities of Daily Living (ADLs) of Elderly Based on 3D Key Human Postures
"... Abstract. This paper presents a cognitive vision approach to recognize a set of interesting activities of daily living (ADLs) for elderly at home. The proposed approach is composed of a video analysis component and an activity recognition component. A video analysis component contains person detecti ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. This paper presents a cognitive vision approach to recognize a set of interesting activities of daily living (ADLs) for elderly at home. The proposed approach is composed of a video analysis component and an activity recognition component. A video analysis component contains person detection, person tracking and human posture recognition. A human posture recognition is composed of a set of postures models and a dedicated human posture recognition algorithm. Activity recognition component contains a set of video event models and a dedicated video event recognition algorithm. In this study, we collaborate with medical experts (gerontologists from Nice hospital) to define and model a set of scenarios related to the interesting activities of elderly. Some of these activities require to detect a fine description of human body such as postures. For this purpose, we propose ten 3D key human postures usefull to recognize a set of interesting human activities regardless of the environment. Using these 3D key human postures, we have modeled thirty four video events, simple ones such as ”a person is standing ” and composite ones such as ”a person is feeling faint”. We have also adapted a video event recognition algorithm to detect in real time some activities of interest by adding posture. The novelty of our approach is the proposed 3D key postures and the set of activity models of elderly person living alone in her/his own home. To validate our proposed models, we have performed a set of experiments in the Gerhome laboratory which is a realistic site reproducing the environment of a typical apartment. For these experiments, we have acquired and processed ten video sequences with one actor. The duration of each video sequence is about ten minutes and each video contains about 4800 frames.

