Results 1 - 10
of
407
HumanEva: Synchronized video and motion capture dataset for evaluation of articulated human motion
, 2006
"... While research on articulated human motion and pose estimation has progressed rapidly in the last few years, there has been no systematic quantitative evaluation of competing methods to establish the current state of the art. We present data obtained using a hardware system that is able to capture s ..."
Abstract
-
Cited by 266 (15 self)
- Add to MetaCart
While research on articulated human motion and pose estimation has progressed rapidly in the last few years, there has been no systematic quantitative evaluation of competing methods to establish the current state of the art. We present data obtained using a hardware system that is able to capture synchronized video and ground-truth 3D motion. The resulting HUMANEVA datasets contain multiple subjects performing a set of predefined actions with a number of repetitions. On the order of 40, 000 frames of synchronized motion capture and multi-view video (resulting in over one quarter million image frames in total) were collected at 60 Hz with an additional 37,000 time instants of pure motion capture data. A standard set of error measures is defined for evaluating both 2D and 3D pose estimation and tracking algorithms. We also describe a baseline algorithm for 3D articulated tracking that uses a relatively standard Bayesian framework with optimization in the form of Sequential Importance Resampling and Annealed Particle Filtering. In the context of this baseline algorithm we explore a variety of likelihood functions, prior models of human motion and the effects of algorithm parameters. Our experiments suggest that image observation models and motion priors play important roles in performance, and that in a multi-view laboratory environment, where initialization is available, Bayesian filtering tends to perform well. The datasets and the software are made available to the research community. This infrastructure will support the development of new articulated motion and pose estimation algorithms, will provide a baseline for the evaluation and comparison of new methods, and will help establish the current state of the art in human pose estimation and tracking.
A spatio-temporal descriptor based on 3d-gradients
- In BMVC’08
"... In this work, we present a novel local descriptor for video sequences. The proposed descriptor is based on histograms of oriented 3D spatio-temporal gradients. Our contribution is four-fold. (i) To compute 3D gradients for arbitrary scales, we develop a memory-efficient algorithm based on integral v ..."
Abstract
-
Cited by 234 (6 self)
- Add to MetaCart
(Show Context)
In this work, we present a novel local descriptor for video sequences. The proposed descriptor is based on histograms of oriented 3D spatio-temporal gradients. Our contribution is four-fold. (i) To compute 3D gradients for arbitrary scales, we develop a memory-efficient algorithm based on integral videos. (ii) We propose a generic 3D orientation quantization which is based on regular polyhedrons. (iii) We perform an in-depth evaluation of all descriptor parameters and optimize them for action recognition. (iv) We apply our descriptor to various action datasets (KTH, Weizmann, Hollywood) and show that we outperform the state-of-the-art. 1
Secrets of Optical Flow Estimation and Their Principles
, 2010
"... The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible throu ..."
Abstract
-
Cited by 195 (10 self)
- Add to MetaCart
(Show Context)
The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective function, the optimization method, and modern implementation practices influence accuracy. We discover that “classical” flow formulations perform surprisingly well when combined with modern optimization and implementation techniques. Moreover, we find that while median filtering of intermediate flow fields during optimization is a key to recent performance gains, it leads to higher energy solutions. To understand the principles behind this phenomenon, we derive a new objective that formalizes the median filtering heuristic. This objective includes a nonlocal term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to include information about flow and image boundaries we develop a method that ranks at the top of the Middlebury benchmark.
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite
"... Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks fo ..."
Abstract
-
Cited by 174 (10 self)
- Add to MetaCart
(Show Context)
Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry / SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti 1.
SIFT Flow: Dense Correspondence across Different Scenes
"... While image registration has been studied in different areas of computer vision, aligning images depicting different scenes remains a challenging problem, closer to recognition than to image matching. Analogous to optical flow, where an image is aligned to its temporally adjacent frame, we propose ..."
Abstract
-
Cited by 136 (15 self)
- Add to MetaCart
(Show Context)
While image registration has been studied in different areas of computer vision, aligning images depicting different scenes remains a challenging problem, closer to recognition than to image matching. Analogous to optical flow, where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its neighbors in a large image collection consisting of a variety of scenes. For a query image, histogram intersection on a bag-of-visual-words representation is used to find the set of nearest neighbors in the database. The SIFT flow algorithm then consists of matching densely sampled SIFT features between the two images, while preserving spatial discontinuities. The use of SIFT features allows robust matching across different scene/object appearances and the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach is able to robustly align complicated scenes with large spatial distortions. We collect a large database of videos and apply the SIFT flow algorithm to two applications: (i) motion field prediction from a single static image and (ii) motion synthesis via transfer of moving objects.
SIFT Flow: Dense Correspondence across Scenes and its Applications
"... While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image ..."
Abstract
-
Cited by 124 (4 self)
- Add to MetaCart
(Show Context)
While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixel-wise SIFT features between two images, while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuitypreserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignmentbased large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications, such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration and face recognition.
Online Object Tracking: A Benchmark
"... Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly ..."
Abstract
-
Cited by 77 (5 self)
- Add to MetaCart
(Show Context)
Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field. 1.
K.: Dense point trajectories by GPU-accelerated large displacement optical flow
, 2010
"... Abstract. Dense and accurate motion tracking is an important requirement for many video feature extraction algorithms. In this paper we provide a method for computing point trajectories based on a fast parallel implementation of a recent optical flow algorithm that tolerates fast motion. The paralle ..."
Abstract
-
Cited by 77 (10 self)
- Add to MetaCart
(Show Context)
Abstract. Dense and accurate motion tracking is an important requirement for many video feature extraction algorithms. In this paper we provide a method for computing point trajectories based on a fast parallel implementation of a recent optical flow algorithm that tolerates fast motion. The parallel implementation of large displacement optical flow runs about 78 × faster than the serial C++ version. This makes it practical to use in a variety of applications, among them point tracking. In the course of obtaining the fast implementation, we also proved that the fixed point matrix obtained in the optical flow technique is positive semi-definite. We compare the point tracking to the most commonly used motion tracker-the KLT tracker- on a number of sequences with ground truth motion. Our resulting technique tracks up to three orders of magnitude more points and is 46 % more accurate than the KLT tracker. It also provides a tracking density of 48 % and has an occlusion error of 3 % compared to a density of 0.1 % and occlusion error of 8 % for the KLT tracker. Compared to the Particle Video tracker, we achieve 66 % better accuracy while retaining the ability to handle large displacements while running an order of magnitude faster. 1
Motion Detail Preserving Optical Flow Estimation
"... We discuss the cause of a severe optical flow estimation problem that fine motion structures cannot always be correctly reconstructed in the commonly employed multiscale variational framework. Our major finding is that significant and abrupt displacement transition wrecks small-scale motion structur ..."
Abstract
-
Cited by 72 (1 self)
- Add to MetaCart
(Show Context)
We discuss the cause of a severe optical flow estimation problem that fine motion structures cannot always be correctly reconstructed in the commonly employed multiscale variational framework. Our major finding is that significant and abrupt displacement transition wrecks small-scale motion structures in the coarse-to-fine refinement. A novel optical flow estimation method is proposed in this paper to address this issue, which reduces the reliance of the flow estimates on their initial values propagated from the coarser level and enables recovering many motion details in each scale. The contribution of this paper also includes adaption of the objective function and development of a new optimization procedure. The effectiveness of our method is borne out by experiments for both large- and small-displacement optical flow estimation.