Results 1 - 10
of
13
Fast and Robust Object Detection Using Visual Subcategories
"... Object classes generally contain large intra-class varia-tion, which poses a challenge to object detection schemes. In this work, we study visual subcategorization as a means of capturing appearance variation. First, training data is clustered using color and gradient features. Second, the clusterin ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
Object classes generally contain large intra-class varia-tion, which poses a challenge to object detection schemes. In this work, we study visual subcategorization as a means of capturing appearance variation. First, training data is clustered using color and gradient features. Second, the clustering is used to learn an ensemble of models that cap-ture visual variation due to varying orientation, truncation, and occlusion degree. Fast object detection is achieved with integral image features and pixel lookup features. The framework is studied in the context of vehicle detection on the challenging KITTI dataset. 1.
Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model
"... Abstract. This paper presents a method of learning reconfigurable hier-archical And-Or models to integrate context and occlusion for car detec-tion. The And-Or model represents the regularities of car-to-car context and occlusion patterns at three levels: (i) layouts of spatially-coupled N cars, (ii ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Abstract. This paper presents a method of learning reconfigurable hier-archical And-Or models to integrate context and occlusion for car detec-tion. The And-Or model represents the regularities of car-to-car context and occlusion patterns at three levels: (i) layouts of spatially-coupled N cars, (ii) single cars with different viewpoint-occlusion configurations, and (iii) a small number of parts. The learning process consists of two stages. We first learn the structure of the And-Or model with three components: (a) mining N-car contextual patterns based on layouts of annotated single car bounding boxes, (b) mining the occlusion config-urations based on the overlapping statistics between single cars, and (c) learning visible parts based on car 3D CAD simulation or heuris-tically mining latent car parts. The And-Or model is organized into a directed and acyclic graph which leads to the Dynamic Programming algorithm in inference. In the second stage, we jointly train the model parameters (for appearance, deformation and bias) using Weak-Label Structural SVM. In experiments, we test our model on four car datasets: the KITTI dataset [11], the street parking dataset [19], the PASCAL VOC2007 car dataset [7], and a self-collected parking lot dataset. We compare with state-of-the-art variants of deformable part-based models and other methods. Our model obtains significant improvement consis-tently on the four datasets.
Talking heads: Detecting humans and recognizing their interactions
- In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
, 2014
"... The objective of this work is to accurately and efficiently detect configurations of one or more people in edited TV material. Such configurations often appear in standard ar-rangements due to cinematic style, and we take advantage of this to provide scene context. We make the following contribution ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The objective of this work is to accurately and efficiently detect configurations of one or more people in edited TV material. Such configurations often appear in standard ar-rangements due to cinematic style, and we take advantage of this to provide scene context. We make the following contributions: first, we introduce a new learnable context aware configuration model for de-tecting sets of people in TV material that predicts the scale and location of each upper body in the configuration; sec-ond, we show that inference of the model can be solved globally and efficiently using dynamic programming, and implement a maximum margin learning framework; and third, we show that the configuration model substantially outperforms a Deformable Part Model (DPM) for predict-ing upper body locations in video frames, even when the DPM is equipped with the context of other upper bodies. Experiments are performed over two datasets: the TV Human Interaction dataset, and 150 episodes from four dif-ferent TV shows. We also demonstrate the benefits of the model in recognizing interactions in TV shows. 1.
Data-Driven 3D Voxel Patterns for Object Category Recognition
"... Despite the great progress achieved in recognizing ob-jects as 2D bounding boxes in images, it is still very chal-lenging to detect occluded objects and estimate the 3D properties of multiple objects from a single image. In this paper, we propose a novel object representation, 3D Voxel Pattern (3DVP ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Despite the great progress achieved in recognizing ob-jects as 2D bounding boxes in images, it is still very chal-lenging to detect occluded objects and estimate the 3D properties of multiple objects from a single image. In this paper, we propose a novel object representation, 3D Voxel Pattern (3DVP), that jointly encodes the key properties of objects including appearance, 3D shape, viewpoint, occlu-sion and truncation. We discover 3DVPs in a data-driven way, and train a bank of specialized detectors for a dictio-nary of 3DVPs. The 3DVP detectors are capable of detect-ing objects with specific visibility patterns and transferring the meta-data from the 3DVPs to the detected objects, such as 2D segmentation mask, 3D pose as well as occlusion or truncation boundaries. The transferred meta-data allows us to infer the occlusion relationship among objects, which in turn provides improved object recognition results. Ex-periments are conducted on the KITTI detection benchmark [17] and the outdoor-scene dataset [41]. We improve state-of-the-art results on car detection and pose estimation with notable margins (6 % in difficult data of KITTI). We also verify the ability of our method in accurately segmenting objects from the background and localizing them in 3D. 1.
Supervised learning and evaluation of KITTI’s cars detector with DPM
- In IV
, 2014
"... Abstract—This paper carries out a discussion on the super-vised learning of a car detector built as a Discriminative Part-based Model (DPM) from images in the recently published KITTI benchmark suite as part of the object detection and orientation estimation challenge. We present a wide set of exper ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—This paper carries out a discussion on the super-vised learning of a car detector built as a Discriminative Part-based Model (DPM) from images in the recently published KITTI benchmark suite as part of the object detection and orientation estimation challenge. We present a wide set of experiments and many hints on the different ways to supervise and enhance the well-known DPM on a challenging and naturalistic urban dataset as KITTI. The evaluation algorithm and metrics, the selection of a clean but representative subset of training samples and the DPM tuning are key factors to learn an object detector in a supervised fashion. We provide evidence of subtle differences in performance depending on these aspects. Besides, the generalization of the trained models to an independent dataset is validated by 5-fold cross-validation. I.
Object Detection by 3D Aspectlets and Occlusion Reasoning
"... We propose a novel framework for detecting multiple ob-jects from a single image and reasoning about occlusions between objects. We address this problem from a 3D per-spective in order to handle various occlusion patterns which can take place between objects. We introduce the concept of “3D aspectle ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
We propose a novel framework for detecting multiple ob-jects from a single image and reasoning about occlusions between objects. We address this problem from a 3D per-spective in order to handle various occlusion patterns which can take place between objects. We introduce the concept of “3D aspectlets ” based on a piecewise planar object repre-sentation. A 3D aspectlet represents a portion of the object which provides evidence for partial observation of the ob-ject. A new probabilistic model (which we called spatial layout model) is proposed to combine the bottom-up evi-dence from 3D aspectlets and the top-down occlusion rea-soning to help object detection. Experiments are conducted on two new challenging datasets with various degrees of occlusions to demonstrate that, by contextualizing objects in their 3D geometric configuration with respect to the ob-server, our method is able to obtain competitive detection results even in the presence of severe occlusions. More-over, we demonstrate the ability of the model to estimate the locations of objects in 3D and predict the occlusion order between objects in images. 1.
3D Object Proposals for Accurate Object Class Detection
"... The goal of this paper is to generate high-quality 3D object proposals in the con-text of autonomous driving. Our method exploits stereo imagery to place propos-als in the form of 3D bounding boxes. We formulate the problem as minimizing an energy function encoding object size priors, ground plane a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
The goal of this paper is to generate high-quality 3D object proposals in the con-text of autonomous driving. Our method exploits stereo imagery to place propos-als in the form of 3D bounding boxes. We formulate the problem as minimizing an energy function encoding object size priors, ground plane as well as several depth informed features that reason about free space, point cloud densities and distance to the ground. Our experiments show significant performance gains over existing RGB and RGB-D object proposal methods on the challenging KITTI benchmark. Combined with convolutional neural net (CNN) scoring, our approach outper-forms all existing results on all three KITTI object classes. 1
Single-Pedestrian Detection Aided by 2-Pedestrian Detection
- IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... In this paper, we address the challenging problem of detecting pedestrians who appear in groups. A new approach is proposed for single-pedestrian detection aided by 2-pedestrian detection. A mixture model of 2-pedestrian detectors is designed to capture the unique visual cues which are formed by nea ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper, we address the challenging problem of detecting pedestrians who appear in groups. A new approach is proposed for single-pedestrian detection aided by 2-pedestrian detection. A mixture model of 2-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and 2-pedestrian detectors, and to refine the single-pedestrian detection result using 2-pedestrian detection. The 2-pedestrian detector can integrate with any single-pedestrian detector. 25 state-of-the-art single-pedestrian detection approaches are combined with the 2-pedestrian detector on three widely used public datasets: Caltech, TUD-Brussels, and ETH. Experimental results show that our framework improves all these approaches. The average improvement is 9 % on the Caltech-Test dataset, 11 % on the TUD-Brussels dataset and 17 % on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 37%
3D Object Class Detection in the Wild
"... Object class detection has been a synonym for 2D bounding box localization for the longest time, fueled by the success of powerful statistical learning techniques, com-bined with robust image representations. Only recently, there has been a growing interest in revisiting the promise of computer visi ..."
Abstract
- Add to MetaCart
Object class detection has been a synonym for 2D bounding box localization for the longest time, fueled by the success of powerful statistical learning techniques, com-bined with robust image representations. Only recently, there has been a growing interest in revisiting the promise of computer vision from the early days: to precisely delin-eate the contents of a visual scene, object by object, in 3D. In this paper, we draw from recent advances in object detec-tion and 2D-3D object lifting in order to design an object class detector that is particularly tailored towards 3D ob-ject class detection. Our 3D object class detection method consists of several stages gradually enriching the object detection output with object viewpoint, keypoints and 3D shape estimates. Following careful design, in each stage it constantly improves the performance and achieves state-of-the-art performance in simultaneous 2D bounding box and viewpoint estimation on the challenging Pascal3D+ [50] dataset. 1.