Results 1 - 10
of
389
Segmentation as Selective Search for Object Recognition
"... For object recognition, the current state-of-the-art is based on exhaustive search. However, to enable the use of more expensive features and classifiers and thereby progress beyond the state-of-the-art, a selective search strategy is needed. Therefore, we adapt segmentation as a selective search by ..."
Abstract
-
Cited by 165 (7 self)
- Add to MetaCart
For object recognition, the current state-of-the-art is based on exhaustive search. However, to enable the use of more expensive features and classifiers and thereby progress beyond the state-of-the-art, a selective search strategy is needed. Therefore, we adapt segmentation as a selective search by reconsidering segmentation: We propose to generate many approximate locations over few and precise object delineations because (1) an object whose location is never generated can not be recognised and (2) appearance and immediate nearby context are most effective for object recognition. Our method is class-independent and is shown to cover 96.7 % of all objects in the Pascal VOC 2007 test set using only 1,536 locations per image. Our selective search enables the use of the more expensive bag-of-words method which we use to substantially improve the state-of-the-art by up to 8.5 % for 8 out of 20 classes on the Pascal VOC 2010 detection challenge.
Image segmentation by probabilistic bottom-up aggregation and cue integration
, 2007
"... We present a parameter free approach that utilizes multiple cues for image segmentation. Beginning with an image, we execute a sequence of bottom-up aggregation steps in which pixels are gradually merged to produce larger and larger regions. In each step we consider pairs of adjacent regions and pro ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
(Show Context)
We present a parameter free approach that utilizes multiple cues for image segmentation. Beginning with an image, we execute a sequence of bottom-up aggregation steps in which pixels are gradually merged to produce larger and larger regions. In each step we consider pairs of adjacent regions and provide a probability measure to assess whether or not they should be included in the same segment. Our probabilistic formulation takes into account intensity and texture distributions in a local area around each region. It further incorporates priors based on the geometry of the regions. Finally, posteriors based on intensity and texture cues are combined using a mixture of experts formulation. This probabilistic approach is integrated into a graph coarsening scheme providing a complete hierarchical segmentation of the image. The algorithm complexity is linear in the number of the image pixels and it requires almost no user-tuned parameters. We test our method on a variety of gray scale images and compare our results to several existing segmentation algorithms.
C.: Structured Forests for Fast Edge Detection
"... Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image pa ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
(Show Context)
Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image patches to learn both an accurate and computation-ally efficient edge detector. We formulate the problem of predicting local edge masks in a structured learning frame-work applied to random decision forests. Our novel ap-proach to learning decision trees robustly maps the struc-tured labels to a discrete space on which standard infor-mation gain measures may be evaluated. The result is an approach that obtains realtime performance that is orders of magnitude faster than many competing state-of-the-art approaches, while also achieving state-of-the-art edge de-tection results on the BSDS500 Segmentation dataset and NYU Depth dataset. Finally, we show the potential of our approach as a general purpose edge detector by showing our learned edge models generalize well across datasets. 1.
Multiscale combinatorial grouping
- In: CVPR (2014
"... We propose a unified approach for bottom-up hierarchical image segmentation and object candidate generation for recognition, called Multiscale Combinatorial Grouping (MCG). For this purpose, we first develop a fast normalized cuts algorithm. We then propose a high-performance hierarchical segmenter ..."
Abstract
-
Cited by 56 (11 self)
- Add to MetaCart
(Show Context)
We propose a unified approach for bottom-up hierarchical image segmentation and object candidate generation for recognition, called Multiscale Combinatorial Grouping (MCG). For this purpose, we first develop a fast normalized cuts algorithm. We then propose a high-performance hierarchical segmenter that makes effective use of multiscale information. Finally, we propose a grouping strategy that combines our multiscale regions into highly-accurate object candidates by exploring efficiently their combinatorial space. We conduct extensive experiments on both the BSDS500 and on the PASCAL 2012 segmentation datasets, showing that MCG produces state-of-the-art contours, hierarchical regions and object candidates. 1.
Discovering localized attributes for fine-grained recognition
- In CVPR. IEEE
, 2012
"... red stripes on wings orange stripes on wings Attributes are visual concepts that can be detected by machines, understood by humans, and shared across categories. They are particularly useful for fine-grained domains where categories are closely related to one other (e.g. bird species recognition). I ..."
Abstract
-
Cited by 52 (1 self)
- Add to MetaCart
(Show Context)
red stripes on wings orange stripes on wings Attributes are visual concepts that can be detected by machines, understood by humans, and shared across categories. They are particularly useful for fine-grained domains where categories are closely related to one other (e.g. bird species recognition). In such scenarios, relevant attributes are often local (e.g. “white belly”), but the question of how to choose these local attributes remains largely unexplored. In this paper, we propose an interactive approach that discovers local attributes that are both discriminative and semantically meaningful from image datasets annotated only with fine-grained category labels and object bounding boxes. Our approach uses a latent conditional random field model to discover candidate attributes that are detectable and discriminative, and then employs a recommender system that selects attributes likely to be semantically meaningful. Human interaction is used to provide semantic names for the discovered attributes. We demonstrate our method on two challenging datasets, Caltech-UCSD Birds-200-2011 and Leeds Butterflies, and find that our discovered attributes outperform those generated by traditional approaches. 1.
Scene parsing with multiscale feature learning, purity trees and optimal covers
- in ICML
, 2012
"... Scene parsing consists in labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
(Show Context)
Scene parsing consists in labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method alleviates the need for engineered features. In parallel to feature extraction, a tree of segments is computed from a graph of pixel dissimilarities. The feature vectors associated with the segments covered by each node in the tree are aggregated and fed to a classifier which produces an estimate of the distribution of object categories contained in the segment. A subset of tree nodes that cover the image are then selected so as to maximize the average “purity ” of the class distributions, hence maximizing the overall likelihood that each segment will contain a single object. The system yields record accuracies on the the Sift Flow Dataset (33 classes) and the Barcelona Dataset (170 classes) and near-record accuracy on the Stanford Background Dataset (8 classes), while being an order of magnitude faster than competing approaches, producing a 320 × 240 image labeling in less than 1 second, including feature extraction. 1. Overview Scene parsing, or full scene labeling (FSL), is the task of labeling each pixel in a scene with the category of the object to which it belongs. FSL requires to solve the detection, segmentation, recognition and contextual integration problems simultaneously, so as to produce a globally consistent labeling. One of the obstacles to FSL is that the information necessary for the labeling of a given pixel may
Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
"... We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundar ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
(Show Context)
We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gP b − ucm approach of [2] by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies superpixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art. 1.
Evaluation of Super-Voxel Methods for Early Video Processing
"... Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis. However, there are many plausible supervoxel methods and little understanding as to when and where each is most appropriate. Indeed, we are not aware of a singl ..."
Abstract
-
Cited by 43 (7 self)
- Add to MetaCart
(Show Context)
Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis. However, there are many plausible supervoxel methods and little understanding as to when and where each is most appropriate. Indeed, we are not aware of a single comparative study on supervoxel segmentation. To that end, we study five supervoxel algorithms in the context of what we consider to be a good supervoxel: namely, spatiotemporal uniformity, object/region boundary detection, region compression and parsimony. For the evaluation we propose a comprehensive suite of 3D volumetric quality metrics to measure these desirable supervoxel characteristics. We use three benchmark video data sets with a variety of content-types and varying amounts of human annotations. Our findings have led us to conclusive evidence that the hierarchical graph-based and segmentation by weighted aggregation methods perform best and almost equally-well on nearly all the metrics and are the methods of choice given our proposed assumptions. 1.
Microsoft COCO: Common Objects in Context
"... Abstract. We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understand-ing. This is achieved by gathering images of complex everyday scenes containing common obj ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
(Show Context)
Abstract. We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understand-ing. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled in-stances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detec-tion, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model. 1