Results 1 - 10
of
30
Robust object recognition with cortex-like mechanisms
- IEEE Trans. Pattern Analysis and Machine Intelligence
, 2007
"... Abstract—We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating b ..."
Abstract
-
Cited by 389 (47 self)
- Add to MetaCart
(Show Context)
Abstract—We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex.
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite
"... Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks fo ..."
Abstract
-
Cited by 174 (10 self)
- Add to MetaCart
Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry / SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti 1.
Why is Real-World Visual Object Recognition Hard
- PLoS Computational Biology
"... Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain’s anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, ‘‘natural’ ’ images have become popular in the study of ..."
Abstract
-
Cited by 125 (8 self)
- Add to MetaCart
(Show Context)
Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain’s anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, ‘‘natural’ ’ images have become popular in the study of vision and have been used to show apparently impressive progress in building such models. Here, we challenge the use of uncontrolled ‘‘natural’ ’ images in guiding that progress. In particular, we show that a simple V1-like model—a neuroscientist’s ‘‘null’ ’ model, which should perform poorly at real-world visual object recognition tasks—outperforms state-of-the-art object recognition systems (biologically inspired and otherwise) on a standard, ostensibly natural image recognition test. As a counterpoint, we designed a ‘‘simpler’ ’ recognition test to better span the real-world variation in object pose, position, and scale, and we show that this test correctly exposes the inadequacy of the V1-like model. Taken together, these results demonstrate that tests based on uncontrolled natural images can be seriously misleading, potentially guiding progress in the wrong direction. Instead, we reexamine what it means for images to be natural and argue for a renewed focus on the core problem of object recognition—real-world image variation.
What and where: A Bayesian inference theory of attention
, 2010
"... In the theoretical framework described in this thesis, attention is part of the inference process that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main properties at the level of psychop ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
In the theoretical framework described in this thesis, attention is part of the inference process that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main properties at the level of psychophysics and physiology. In our approach, the main goal of the visual system is to infer the identity and the position of objects in visual scenes: spatial attention emerges as a strategy to reduce the uncertainty in shape information while feature-based attention reduces the uncertainty in spatial information. Featural and spatial attention represent two distinct modes of a computational process solving the problem of recognizing and localizing objects, especially in difficult recognition tasks such as in cluttered natural scenes. We describe a specific computational model and relate it to the known functional anatomy of attention. We show that several well-known attentional phenomena – including bottom-up pop-out effects, multiplicative modulation of neuronal tuning
L.: Supervised label transfer for semantic segmentation of street scenes
, 2010
"... Abstract. In this paper, we propose a robust supervised label transfer method for the semantic segmentation of street scenes. Given an input image of street scene, we first find multiple image sets from the training database consisting of images with annotation, each of which can cover all semantic ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we propose a robust supervised label transfer method for the semantic segmentation of street scenes. Given an input image of street scene, we first find multiple image sets from the training database consisting of images with annotation, each of which can cover all semantic categories in the input image. Then, we establish dense cor-respondence between the input image and each found image sets with a proposed KNN-MRF matching scheme. It is followed by a matching correspondences classification that tries to reduce the number of seman-tically incorrect correspondences with trained matching correspondences classification models for different categories. With those matching cor-respondences classified as semantically correct correspondences, we infer the confidence values of each super pixel belonging to different semantic categories, and integrate them and spatial smoothness constraint in a markov random field to segment the input image. Experiments on three datasets show our method outperforms the traditional learning based methods and the previous nonparametric label transfer method, for the semantic segmentation of street scenes. 1
Beyond the line of sight: labeling the underlying surfaces
"... Abstract. Scene understanding requires reasoning about both what we can see and what is occluded. We offer a simple and general approach to infer labels of occluded background regions. Our approach incorporates estimates of visible surrounding background, detected objects, and shape priors from tran ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Scene understanding requires reasoning about both what we can see and what is occluded. We offer a simple and general approach to infer labels of occluded background regions. Our approach incorporates estimates of visible surrounding background, detected objects, and shape priors from transferred training regions. We demonstrate the ability to infer the labels of occluded background regions in both the outdoor StreetScenes dataset and an indoor scene dataset using the same approach. Our experiments show that our method outperforms competent baselines. 1
Context driven focus of attention for object detection
- In WAPCV
, 2007
"... Abstract. Context plays an important role in general scene percep-tion. In particular, it can provide cues about an object’s location within an image. In computer vision, object detectors typically ignore this in-formation. We tackle this problem by presenting a concept of how to extract and learn c ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Context plays an important role in general scene percep-tion. In particular, it can provide cues about an object’s location within an image. In computer vision, object detectors typically ignore this in-formation. We tackle this problem by presenting a concept of how to extract and learn contextual information from examples. This context is then used to calculate a focus of attention, that represents a prior for object detection. State-of-the-art local appearance-based object detec-tion methods are then applied on selected parts of the image only. We demonstrate the performance of this approach on the task of pedestrian detection in urban scenes using a demanding image database. Results show that context awareness provides complementary information over pure local appearance-based processing. In addition, it cuts down the search complexity and increases the robustness of object detection. 1
What and where: a bayesian inference theory of visual attention
- Vision Research
"... In the theoretical framework described in this thesis, attention is part of the inference pro-cess that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main prop-erties at the level of psych ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In the theoretical framework described in this thesis, attention is part of the inference pro-cess that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main prop-erties at the level of psychophysics and physiology. In our approach, the main goal of the visual system is to infer the identity and the position of objects in visual scenes: spa-tial attention emerges as a strategy to reduce the uncertainty in shape information while feature-based attention reduces the uncertainty in spatial information. Featural and spatial attention represent two distinct modes of a computational process solving the problem of recognizing and localizing objects, especially in difficult recognition tasks such as in clut-tered natural scenes. We describe a specific computational model and relate it to the known functional anatomy of attention. We show that several well-known attentional phenom-ena- including bottom-up pop-out effects, multiplicative modulation of neuronal tuning curves and shift in contrast responses- emerge naturally as predictions of the model. We also show that the bayesian model predicts well human eye fixations (considered as a proxy
Pedestrian detectability: Predicting human perception performance with machine vision
- in Proc. 2011 IEEE Intelligent Vehicles Symposium
, 2007
"... Abstract—How likely is it that a driver notices a person standing on the side of the road? In this paper we introduce the concept of pedestrian detectability. It is a measure of how probable it is that a human observer perceives pedestrians in an image. We acquire a dataset of pedestrians with their ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract—How likely is it that a driver notices a person standing on the side of the road? In this paper we introduce the concept of pedestrian detectability. It is a measure of how probable it is that a human observer perceives pedestrians in an image. We acquire a dataset of pedestrians with their associated detectabilities in a rapid detection experiment using images of street scenes. On this dataset we learn a regression function that allows us to predict human detectabilities from an optimized set of image and contextual features. We exploit this function to infer the optimal focus of attention for pedestrian detection. With this combination of human perception and machine vision we propose a method we deem useful for the optimization of Human-Machine-Interfaces in driver assistance systems. I.
Probabilistic combination of visual context based attention and object detection
- In WAPCV
, 2008
"... Abstract. Visual context provides cues about an object’s presence, po-sition and size within the observed scene, which are used to increase the performance of object detection techniques. However, state-of-the-art methods for context aware object detection could decrease the initial performance. We ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Visual context provides cues about an object’s presence, po-sition and size within the observed scene, which are used to increase the performance of object detection techniques. However, state-of-the-art methods for context aware object detection could decrease the initial performance. We discuss the reasons for failure and propose a concept that overcomes these limitations. Therefore, we introduce the prior prob-ability function of an object detector, that maps the detector’s output to probabilities. Together, with an appropriate contextual weighting a probabilistic framework is established. In addition, we present an exten-sion to state-of-the-art methods to learn scale-dependent visual context information and show how this increases the initial performance. The standard methods and our proposed extensions are compared on a novel demanding image data set. 1