• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Do we need more training data or better models for object detection. Paper presented at the British Machine Vision Conference (2012)

by X Zhu, C Vondrick, D Ramanan, C C Fowlkes
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 44
Next 10 →

Microsoft COCO: Common Objects in Context

by Tsung-yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick
"... Abstract. We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understand-ing. This is achieved by gathering images of complex everyday scenes containing common obj ..."
Abstract - Cited by 43 (3 self) - Add to MetaCart
Abstract. We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understand-ing. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled in-stances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detec-tion, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model. 1
(Show Context)

Citation Context

... In some categories (e.g., dog, cat, people), models trained on MS COCO perform worse, while on others (e.g., bus, tv, horse), models trained on our data are better. Consistent with past observations =-=[46]-=-, we find that including difficult (noniconic) images during training may not always help. Such examples may act as noise and pollute the learned model if the model is not rich enough to capture such ...

Human Pose Estimation using Body Parts Dependent Joint Regressors

by Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van Gool
"... In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the probl ..."
Abstract - Cited by 31 (6 self) - Add to MetaCart
In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods. 1.

Histograms of Sparse Codes for Object Detection

by Xiaofeng Ren, Deva Ramanan
"... Object detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features. Can we go beyond gradients and do better than HOG? We provide an affirmative answer by proposing and investigating a sparse representation for object detect ..."
Abstract - Cited by 28 (2 self) - Add to MetaCart
Object detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features. Can we go beyond gradients and do better than HOG? We provide an affirmative answer by proposing and investigating a sparse representation for object detection, Histograms of Sparse Codes (HSC). We compute sparse codes with dictionaries learned from data using K-SVD, and aggregate per-pixel sparse codes to form local histograms. We intentionally keep true to the sliding window framework (with mixtures and parts) and only change the underlying features. To keep training (and testing) efficient, we apply dimension reduction by computing SVD on learned models, and adopt supervised training where latent positions of roots and parts are given externally e.g. from a HOG-based detector. By learning and using local representations that are much more expressive than gradients, we demonstrate large improvements over the state of the art on the PASCAL benchmark for both rootonly and part-based models. 1.
(Show Context)

Citation Context

...a bottleneck as the Moore’s Law drives up computational capabilities. There are evidences that local features are most crucial for detection [23], and we may already be saturating the capacity of HOG =-=[36]-=-. Can we learn representations that outperform a handengineered HOG? In the wake of recent advances in feature learning [16, 1] and its successes in many vision problems such as recognition [19] and g...

Seeking the strongest rigid detector

by Rodrigo Benenson, Markus Mathias, Tinne Tuytelaars, Luc Van Gool - In CVPR
"... The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called “components”), with each model it-self composed of collections of interrelated parts (deform-able models). These detectors build upon the now classic Histog ..."
Abstract - Cited by 25 (4 self) - Add to MetaCart
The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called “components”), with each model it-self composed of collections of interrelated parts (deform-able models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the fea-ture pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for ped-estrian detections, using a single rigid component. We provide experiments for a large design space, that give insights into the design of classifiers, as well as relev-ant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other meth-ods on the INRIA, ETH and Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%. 1.
(Show Context)

Citation Context

...eedy boosting is used. The baseline Adaboost 2k still is the best choice. 8. Which training set? It is well-known that the data used to train an algorithm is just as important as the algorithm itself =-=[18]-=-. The INRIA dataset has been regularly used for training pedestrian detectors [6, table 2], despite being quite small by today’s standards. During the bootstrapping stages of learning, we observe that...

HOGgles: Visualizing Object Detection Features ∗

by Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba
"... We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on ‘HOG goggles ’ and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new wa ..."
Abstract - Cited by 24 (1 self) - Add to MetaCart
We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on ‘HOG goggles ’ and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector’s failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they do look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and indicates that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors. By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems. Figure 1: An image from PASCAL and a high scoring car detection from DPM [8]. Why did the detector fail? 1.
(Show Context)

Citation Context

...explored the set of images that generate identical HOG descriptors. Liu and Wang [12] designed algorithms to highlight which image regions contribute the most to a classifier’s confidence. Zhu et al. =-=[24]-=- try to determine whether we have reached Bayes risk for HOG. The tools in this paper enable an alternative mode to analyze object detectors through visualizations. By putting on ‘HOG glasses’ and vis...

Learning Collections of Part Models for Object Recognition

by Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
"... We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories. 1.
(Show Context)

Citation Context

... correspondence. Given a collection of object examples, the learner must determine which examples or portions of examples should belong to the same appearance model. A detailed analysis by Zhu et al. =-=[22]-=- concludes that finding better methods to organize examples and parts into visual sub-categories is the most promising direction for future research. In this paper, we focus on the problem of learning...

Learning everything about anything: Webly-supervised visual concept learning

by Santosh K. Divvala, Ali Farhadi, Carlos Guestrin - In CVPR
"... Figure 1: We introduce a fully-automated method that, given any concept, discovers an exhaustive vocabulary explaining all its appearance variations (i.e., actions, interactions, attributes, etc.), and trains full-fledged detection models for it. This figure shows a few of the many variations that o ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
Figure 1: We introduce a fully-automated method that, given any concept, discovers an exhaustive vocabulary explaining all its appearance variations (i.e., actions, interactions, attributes, etc.), and trains full-fledged detection models for it. This figure shows a few of the many variations that our method has learned for four different classes of concepts: object (horse), scene (kitchen), event (Christmas), and action (walking). Recognition is graduating from labs to real-world ap-plications. While it is encouraging to see its potential being tapped, it brings forth a fundamental challenge to the vision researcher: scalability. How can we learn a model for any concept that exhaustively covers all its appearance varia-tions, while requiring minimal or no human supervision for compiling the vocabulary of visual variance, gathering the training images and annotations, and learning the models? In this paper, we introduce a fully-automated approach for learning extensive models for a wide range of variations (e.g. actions, interactions, attributes and beyond) within any concept. Our approach leverages vast resources of on-line books to discover the vocabulary of variance, and in-tertwines the data collection and modeling steps to alleviate the need for explicit human supervision in training the mod-els. Our approach organizes the visual knowledge about a concept in a convenient and useful way, enabling a variety of applications across vision and NLP. Our online system has been queried by users to learn models for several inter-esting concepts including breakfast, Gandhi, beautiful, etc. To date, our system has models available for over 50,000 variations within 150 concepts, and has annotated more than 10 million images with bounding boxes. 1.
(Show Context)

Citation Context

...s variance have considered simple annotations based on aspect-ratio [18], viewpoint [9], and feature-space clustering [13]. These annotations can only tackle simple appearance variations of an object =-=[51]-=-. Recent works have considered more complex annotations such as phrases [43], phraselets [12], and attributes [16, 23]. While explicit supervision is required to gather the list of phrases and their b...

Sliding Shapes for 3D Object Detection in Depth Images

by Shuran Song, Jianxiong Xiao
"... Abstract. The depth information of RGB-D sensors has greatly simplified some common challenges in computer vision and enabled breakthroughs for several tasks. In this paper, we propose to use depth maps for object detection and de-sign a 3D detector to overcome the major difficulties for recognition ..."
Abstract - Cited by 14 (3 self) - Add to MetaCart
Abstract. The depth information of RGB-D sensors has greatly simplified some common challenges in computer vision and enabled breakthroughs for several tasks. In this paper, we propose to use depth maps for object detection and de-sign a 3D detector to overcome the major difficulties for recognition, namely the variations of texture, illumination, shape, viewpoint, clutter, occlusion, self-occlusion and sensor noises. We take a collection of 3D CAD models and render each CAD model from hundreds of viewpoints to obtain synthetic depth maps. For each depth rendering, we extract features from the 3D point cloud and train an Exemplar-SVM classifier. During testing and hard-negative mining, we slide a

Subcategory-aware object classification

by Jian Dong, Wei Xia, Qiang Chen, Jianshi Feng, Zhongyang Huang, Shuicheng Yan - In: CVPR (2013
"... In this paper, we introduce a subcategory-aware object classification framework to boost category level object clas-sification performance. Motivated by the observation of considerable intra-class diversities and inter-class ambigu-ities in many current object classification datasets, we ex-plicitly ..."
Abstract - Cited by 13 (2 self) - Add to MetaCart
In this paper, we introduce a subcategory-aware object classification framework to boost category level object clas-sification performance. Motivated by the observation of considerable intra-class diversities and inter-class ambigu-ities in many current object classification datasets, we ex-plicitly split data into subcategories by ambiguity guided subcategory mining. We then train an individual model for each subcategory rather than attempt to represent an object category with a monolithic model. More specifically, we build the instance affinity graph by combining both intra-class similarity and inter-class ambiguity. Visual subcate-gories, which correspond to the dense subgraphs, are de-tected by the graph shift algorithm and seamlessly inte-grated into the state-of-the-art detection assisted classifi-cation framework. Finally the responses from subcategory models are aggregated by subcategory-aware kernel regres-sion. The extensive experiments over the PASCAL VOC 2007 and PASCAL VOC 2010 databases show the state-of-the-art performance from our framework.
(Show Context)

Citation Context

... to object classification [31, 21]. As most standard semantic categories do not form coherent visual categories, mixture models are proposed and have become the standard approach for object detection =-=[39, 16]-=-. Early works only investigate heuristics based on meta-data or manual labels such as bounding box aspect ratio [16], object scale [29], object viewpoint [19] and part labels [3] to group the positive...

Detecting avocados to zucchinis: what have we done, and where are we going?

by Olga Russakovsky, Jia Deng, Zhiheng Huang, Er C. Berg, Li Fei-fei
"... The growth of detection datasets and the multiple directions of object detection research provide both an unprecedented need and a great opportunity for a thorough evaluation of the current state of the field of categorical object detection. In this paper we strive to answer two key questions. First ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
The growth of detection datasets and the multiple directions of object detection research provide both an unprecedented need and a great opportunity for a thorough evaluation of the current state of the field of categorical object detection. In this paper we strive to answer two key questions. First, where are we currently as a field: what have we done right, what still needs to be improved? Second, where should we be going in designing the next generation of object detectors? Inspired by the recent work of Hoiem et al. [10] on the standard PASCAL VOC detection dataset, we perform a large-scale study on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data. First, we quantitatively demonstrate that this dataset provides many of the same detection challenges as the PASCAL VOC. Due to its scale of 1000 object categories, ILSVRC also provides an excellent testbed for understanding the performance of detectors as a function of several key properties of the object classes. We conduct a series of analyses looking at how different detection methods perform on a number of imagelevel and object-class-level properties such as texture, color, deformation, and clutter. We learn important lessons of the current object detection methods and propose a number of insights for designing the next generation object detectors. 1.
(Show Context)

Citation Context

...changes of viewpoints, for both specific category detection and general object detection on a small set of categories [23, 22, 11, 6, 5]. Others have provided insight into dataset design [14, 18, 7]. =-=[25]-=- analyzed the relative impact of adding more training data versus building better detection models. The most in-depth analysis of generic object detection to date has been performed on the PASCAL chal...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University