Results 1 - 10
of
304
Real-time human pose recognition in parts from single depth images
- IN CVPR
, 2011
"... We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler p ..."
Abstract
-
Cited by 568 (17 self)
- Add to MetaCart
(Show Context)
We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.
Learning Local Image Descriptors
- Proc. IEEE Conf. Computer Vision and Pattern Recognition
, 2007
"... Abstract—In this paper, we explore methods for learning local image descriptors from training data. We describe a set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier. We consider both li ..."
Abstract
-
Cited by 174 (2 self)
- Add to MetaCart
(Show Context)
Abstract—In this paper, we explore methods for learning local image descriptors from training data. We describe a set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier. We consider both linear and nonlinear transforms with dimensionality reduction, and make use of discriminant learning techniques such as Linear Discriminant Analysis (LDA) and Powell minimization to solve for the parameters. Using these techniques, we obtain descriptors that exceed state-of-the-art performance with low dimensionality. In addition to new experiments and recommendations for descriptor learning, we are also making available a new and realistic ground truth data set based on multiview stereo data. Index Terms—Image descriptors, local features, discriminative learning, SIFT. Ç 1
Associative hierarchical CRFs for object class image segmentation
- in Proc. ICCV
, 2009
"... Most methods for object class segmentation are formulated as a labelling problem over a single choice of quantisation of an image space- pixels, segments or group of segments. It is well known that each quantisation has its fair share of pros and cons; and the existence of a common optimal quantisat ..."
Abstract
-
Cited by 172 (25 self)
- Add to MetaCart
(Show Context)
Most methods for object class segmentation are formulated as a labelling problem over a single choice of quantisation of an image space- pixels, segments or group of segments. It is well known that each quantisation has its fair share of pros and cons; and the existence of a common optimal quantisation level suitable for all object categories is highly unlikely. Motivated by this observation, we propose a hierarchical random field model, that allows integration of features computed at different levels of the quantisation hierarchy. MAP inference in this model can be performed efficiently using powerful graph cut based move making algorithms. Our framework generalises much of the previous work based on pixels or segments. We evaluate its efficiency on some of the most challenging data-sets for object class segmentation, and show it obtains state-of-the-art results. 1.
Auto-context and its Application to High-level Vision Tasks
- In Proc. CVPR
"... The notion of using context information for solving high-level vision and medical image segmentation problems has been increasingly realized in the field. However, how to learn an effective and efficient context model, together with an image appearance model, remains mostly unknown. The current lite ..."
Abstract
-
Cited by 156 (6 self)
- Add to MetaCart
(Show Context)
The notion of using context information for solving high-level vision and medical image segmentation problems has been increasingly realized in the field. However, how to learn an effective and efficient context model, together with an image appearance model, remains mostly unknown. The current literature using Markov Random Fields (MRFs) and Conditional Random Fields (CRFs) often involves specific algorithm design, in which the modeling and computing stages are studied in isolation. In this paper, we propose the auto-context algorithm. Given a set of training images and their corresponding label maps, we first learn a classifier on local image patches. The discriminative probability (or classification confidence) maps created by the learned classifier are then used as context information, in addition to the original image patches, to train a new classifier. The algorithm then iterates until convergence. Auto-context integrates low-level and context information by fusing a large number of low-level appearance features with context and implicit shape information. The resulting discriminative algorithm is general and easy to implement. Under nearly the same parameter settings in training, we apply the algorithm to three challenging vision applications: foreground/background segregation, human body configuration estimation, and scene region labeling. Moreover, context also plays a very important role in medical/brain images where the anatomical structures are mostly constrained to relatively fixed positions. With only some slight changes resulting from using 3D instead of 2D features, the auto-context algorithm applied to brain MRI image segmentation is shown to outperform state-of-the-art algorithms specifically designed for this domain. Furthermore, the scope of the proposed algorithm goes beyond image analysis and it has the potential to be used for a wide variety of problems in multi-variate labeling.
Class-specific hough forests for object detection
- In Proceedings IEEE Conference Computer Vision and Pattern Recognition
, 2009
"... We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locatio ..."
Abstract
-
Cited by 151 (18 self)
- Add to MetaCart
(Show Context)
We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locations of the centroid of the whole object; the detection hypotheses then correspond to the maxima of the Hough image that accumulates the votes from all parts. However, whereas the previous methods detect object parts using generative codebooks of part appearances, we take a more discriminative approach to object part detection. Towards this end, we train a class-specific Hough forest, which is a random forest that directly maps the image patch appearance to the probabilistic vote about the possible location of the object centroid. We demonstrate that Hough forests improve the results of the Hough-transform object detection significantly and achieve state-of-the-art performance for several classes and datasets. 1.
Segmentation and recognition using structure from motion point clouds
- In ECCV
, 2008
"... Abstract. We propose an algorithm for semantic segmentation based on 3D point clouds derived from ego-motion. We motivate five simple cues designed to model specific patterns of motion and 3D world structure that vary with object category. We introduce features that project the 3D cues back to the 2 ..."
Abstract
-
Cited by 128 (11 self)
- Add to MetaCart
(Show Context)
Abstract. We propose an algorithm for semantic segmentation based on 3D point clouds derived from ego-motion. We motivate five simple cues designed to model specific patterns of motion and 3D world structure that vary with object category. We introduce features that project the 3D cues back to the 2D image plane while modeling spatial layout and context. A randomized decision forest combines many such features to achieve a coherent 2D segmentation and recognize the object categories present. Our main contribution is to show how semantic segmentation is possible based solely on motion-derived 3D world structure. Our method works well on sparse, noisy point clouds, and unlike existing approaches, does not need appearance-based descriptors. Experiments were performed on a challenging new video database containing sequences filmed from a moving car in daylight and at dusk. The results confirm that indeed, accurate segmentation and recognition are possible using only motion and 3D world structure. Further, we show that the motion-derived information complements an existing state-of-the-art appearance-based method, improving both qualitative and quantitative performance. input video frame reconstructed 3D point cloud automatic segmentation Fig. 1. The proposed algorithm uses 3D point clouds estimated from videos such as the pictured driving sequence (with ground truth inset). Having trained on point clouds from other driving sequences, our new motion and structure features, based purely on the point cloud, perform 11-class semantic segmentation of each test frame. The colors in the ground truth and inferred segmentation indicate category labels. 2 1
SuperParsing: Scalable Nonparametric Image Parsing with Superpixels
"... Abstract. This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach requires no training, and it can easily scale to datasets with ten ..."
Abstract
-
Cited by 128 (4 self)
- Add to MetaCart
(Show Context)
Abstract. This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach requires no training, and it can easily scale to datasets with tens of thousands of images and hundreds of labels. It works by scene-level matching with global image descriptors, followed by superpixel-level matching with local features and efficient Markov random field (MRF) optimization for incorporating neighborhood context. Our MRF setup can also compute a simultaneous labeling of image regions into semantic classes (e.g., tree, building, car) and geometric classes (sky, vertical, ground). Our system outperforms the state-of-the-art nonparametric method based on SIFT Flow on a dataset of 2,688 images and 33 labels. In addition, we report per-pixel rates on a larger dataset of 15,150 images and 170 labels. To our knowledge, this is the first complete evaluation of image parsing on a dataset of this size, and it establishes a new benchmark for the problem. Key words: scene understanding, image parsing, image segmentation 1
Efficient Object Category Recognition Using
"... Abstract. We introduce a new descriptor for images which allows the construction of efficient and compact classifiers with good accuracy on object category recognition. The descriptor is the output of a large number of weakly trained object category classifiers on the image. The trained categories a ..."
Abstract
-
Cited by 122 (9 self)
- Add to MetaCart
(Show Context)
Abstract. We introduce a new descriptor for images which allows the construction of efficient and compact classifiers with good accuracy on object category recognition. The descriptor is the output of a large number of weakly trained object category classifiers on the image. The trained categories are selected from an ontology of visual concepts, but the intention is not to encode an explicit decomposition of the scene. Rather, we accept that existing object category classifiers often encode not the category per se but ancillary image characteristics; and that these ancillary characteristics can combine to represent visual classes unrelated to the constituent categories ’ semantic meanings. The advantage of this descriptor is that it allows object-category queries to be made against image databases using efficient classifiers (efficient at test time) such as linear support vector machines, and allows these queries to be for novel categories. Even when the representation is reduced to 200 bytes per image, classification accuracy on object category recognition is comparable with the state of the art (36 % versus 42%), but at orders of magnitude lower computational cost.
Learning 3D mesh segmentation and labeling
- ACM Trans. on Graphics
, 2010
"... head torso upper arm lower arm hand upper leg lower leg foot ear head torso arm leg tail body fin handle cup top base arm lens bridge antenna head thorax leg abdomen cup handle face hair neck fin stabilizer body wing top leg thumb index middle ring pinky palm big roller medium roller axle handle joi ..."
Abstract
-
Cited by 101 (7 self)
- Add to MetaCart
head torso upper arm lower arm hand upper leg lower leg foot ear head torso arm leg tail body fin handle cup top base arm lens bridge antenna head thorax leg abdomen cup handle face hair neck fin stabilizer body wing top leg thumb index middle ring pinky palm big roller medium roller axle handle joint jaws head neck torso leg tail ear head torso back upper arm lower arm hand upper leg lower leg foot tail head wing body leg tail big cube small cube back middle seat leg head tentacle Figure 1: Labeling and segmentation results from applying our algorithm to one mesh each from every category in the Princeton Segmentation Benchmark [Chen et al. 2009]. For each result, the algorithm was trained on the other meshes in the same class, e.g., the human was labeled after training on the other meshes in the human class. This paper presents a data-driven approach to simultaneous segmentation and labeling of parts in 3D meshes. An objective function is formulated as a Conditional Random Field model, with terms assessing the consistency of faces with labels, and terms between labels of neighboring faces. The objective function is learned from a collection of labeled training meshes. The algorithm uses hundreds of geometric and contextual label features and learns different types of segmentations for different tasks, without requiring manual parameter tuning. Our algorithm achieves a significant improvement in results over the state-of-the-art when evaluated on the Princeton Segmentation Benchmark, often producing segmentations and labelings comparable to those produced by humans. 1