• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

StreetScenes: Towards scene understanding in still images (2006)

by S M Bileschi
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 30
Next 10 →

Robust object recognition with cortex-like mechanisms

by Thomas Serre, Lior Wolf, Stanley Bileschi, Maximilian Riesenhuber, Tomaso Poggio - IEEE Trans. Pattern Analysis and Machine Intelligence , 2007
"... Abstract—We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating b ..."
Abstract - Cited by 389 (47 self) - Add to MetaCart
Abstract—We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex.
(Show Context)

Citation Context

...ures (e.g., good-continuity detectors, circularity detectors, and symmetry detectors) within the same framework in addition to the C2 SMFs, Wolf et al. obtained 51.2 percent 1.2 percent correct [48], =-=[49]-=- and recently incorporated some changes with Sharat Chikkerur to get 55.0 percent 0.9 percent (all these results are for 15 training images). At press time, some of the bests418 IEEE TRANSACTIONS ON P...

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

by Andreas Geiger, Philip Lenz, Raquel Urtasun
"... Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks fo ..."
Abstract - Cited by 174 (10 self) - Add to MetaCart
Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry / SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti 1.

Why is Real-World Visual Object Recognition Hard

by Nicolas Pinto, David D. Cox, James J. Dicarlo - PLoS Computational Biology
"... Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain’s anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, ‘‘natural’ ’ images have become popular in the study of ..."
Abstract - Cited by 125 (8 self) - Add to MetaCart
Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain’s anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, ‘‘natural’ ’ images have become popular in the study of vision and have been used to show apparently impressive progress in building such models. Here, we challenge the use of uncontrolled ‘‘natural’ ’ images in guiding that progress. In particular, we show that a simple V1-like model—a neuroscientist’s ‘‘null’ ’ model, which should perform poorly at real-world visual object recognition tasks—outperforms state-of-the-art object recognition systems (biologically inspired and otherwise) on a standard, ostensibly natural image recognition test. As a counterpoint, we designed a ‘‘simpler’ ’ recognition test to better span the real-world variation in object pose, position, and scale, and we show that this test correctly exposes the inadequacy of the V1-like model. Taken together, these results demonstrate that tests based on uncontrolled natural images can be seriously misleading, potentially guiding progress in the wrong direction. Instead, we reexamine what it means for images to be natural and argue for a renewed focus on the core problem of object recognition—real-world image variation.
(Show Context)

Citation Context

...antage of directly sampling the true problem domain. However, annotating such an image set is extremely labor-intensive (but see the LabelMe project [25], Peekaboom [26], and the StreetScenes dataset =-=[2,27]-=-). More importantly, a set that truly reflects all real-world variation may be too stringent of an assay to guide improvement in recognition models. That is, if the problem is too hard, it is not easy...

What and where: A Bayesian inference theory of attention

by Sharat Chikkerur , 2010
"... In the theoretical framework described in this thesis, attention is part of the inference process that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main properties at the level of psychop ..."
Abstract - Cited by 36 (6 self) - Add to MetaCart
In the theoretical framework described in this thesis, attention is part of the inference process that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main properties at the level of psychophysics and physiology. In our approach, the main goal of the visual system is to infer the identity and the position of objects in visual scenes: spatial attention emerges as a strategy to reduce the uncertainty in shape information while feature-based attention reduces the uncertainty in spatial information. Featural and spatial attention represent two distinct modes of a computational process solving the problem of recognizing and localizing objects, especially in difficult recognition tasks such as in cluttered natural scenes. We describe a specific computational model and relate it to the known functional anatomy of attention. We show that several well-known attentional phenomena – including bottom-up pop-out effects, multiplicative modulation of neuronal tuning

L.: Supervised label transfer for semantic segmentation of street scenes

by Honghui Zhang, Jianxiong Xiao, Long Quan , 2010
"... Abstract. In this paper, we propose a robust supervised label transfer method for the semantic segmentation of street scenes. Given an input image of street scene, we first find multiple image sets from the training database consisting of images with annotation, each of which can cover all semantic ..."
Abstract - Cited by 17 (3 self) - Add to MetaCart
Abstract. In this paper, we propose a robust supervised label transfer method for the semantic segmentation of street scenes. Given an input image of street scene, we first find multiple image sets from the training database consisting of images with annotation, each of which can cover all semantic categories in the input image. Then, we establish dense cor-respondence between the input image and each found image sets with a proposed KNN-MRF matching scheme. It is followed by a matching correspondences classification that tries to reduce the number of seman-tically incorrect correspondences with trained matching correspondences classification models for different categories. With those matching cor-respondences classified as semantically correct correspondences, we infer the confidence values of each super pixel belonging to different semantic categories, and integrate them and spatial smoothness constraint in a markov random field to segment the input image. Experiments on three datasets show our method outperforms the traditional learning based methods and the previous nonparametric label transfer method, for the semantic segmentation of street scenes. 1
(Show Context)

Citation Context

... label transfer method, for the semantic segmentation of street scenes. 1 Introduction Semantic segmentation of street scenes is an important and interesting researching topic for scene understanding =-=[1, 2]-=- and image based modeling in cities and urban areas[3–6]. Traditional methods to solve this problem, such as [7–11], typically work with a fixed-number of object categories and train generative or dis...

Beyond the line of sight: labeling the underlying surfaces

by Ruiqi Guo, Derek Hoiem
"... Abstract. Scene understanding requires reasoning about both what we can see and what is occluded. We offer a simple and general approach to infer labels of occluded background regions. Our approach incorporates estimates of visible surrounding background, detected objects, and shape priors from tran ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
Abstract. Scene understanding requires reasoning about both what we can see and what is occluded. We offer a simple and general approach to infer labels of occluded background regions. Our approach incorporates estimates of visible surrounding background, detected objects, and shape priors from transferred training regions. We demonstrate the ability to infer the labels of occluded background regions in both the outdoor StreetScenes dataset and an indoor scene dataset using the same approach. Our experiments show that our method outperforms competent baselines. 1
(Show Context)

Citation Context

...uctured scene representation in terms of a few polygons (rather than maps of pixel confidences). To demonstrate the generality of our approach, we perform experiments on the CBCL StreetScenes dataset =-=[4]-=-, Hedau et al. [2]’s indoor scene dataset as well as the SUN09 dataset. Each dataset has polygonal labels that can be used to evaluate identification of visible surfaces (as is usually done) or labeli...

Context driven focus of attention for object detection

by Ales ̌ Leonardis - In WAPCV , 2007
"... Abstract. Context plays an important role in general scene percep-tion. In particular, it can provide cues about an object’s location within an image. In computer vision, object detectors typically ignore this in-formation. We tackle this problem by presenting a concept of how to extract and learn c ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Abstract. Context plays an important role in general scene percep-tion. In particular, it can provide cues about an object’s location within an image. In computer vision, object detectors typically ignore this in-formation. We tackle this problem by presenting a concept of how to extract and learn contextual information from examples. This context is then used to calculate a focus of attention, that represents a prior for object detection. State-of-the-art local appearance-based object detec-tion methods are then applied on selected parts of the image only. We demonstrate the performance of this approach on the task of pedestrian detection in urban scenes using a demanding image database. Results show that context awareness provides complementary information over pure local appearance-based processing. In addition, it cuts down the search complexity and increases the robustness of object detection. 1
(Show Context)

Citation Context

...t al. as a contextual cue (alongside other cues). The main difference is that we learn the configuration of this contextual information directly and not only to calculate a horizon estimate. Bileschi =-=[19]-=- classifies an image into four pre-defined semantic classes. These classes indicate the presence of buildings, roads, skies, and trees, which are identified using their texture properties. These class...

What and where: a bayesian inference theory of visual attention

by Sharat Chikkerur, Tomaso Poggio, Eugene Mcdermott Professor, Terry P. Orlando, Sharat Chikkerur - Vision Research
"... In the theoretical framework described in this thesis, attention is part of the inference pro-cess that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main prop-erties at the level of psych ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
In the theoretical framework described in this thesis, attention is part of the inference pro-cess that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main prop-erties at the level of psychophysics and physiology. In our approach, the main goal of the visual system is to infer the identity and the position of objects in visual scenes: spa-tial attention emerges as a strategy to reduce the uncertainty in shape information while feature-based attention reduces the uncertainty in spatial information. Featural and spatial attention represent two distinct modes of a computational process solving the problem of recognizing and localizing objects, especially in difficult recognition tasks such as in clut-tered natural scenes. We describe a specific computational model and relate it to the known functional anatomy of attention. We show that several well-known attentional phenom-ena- including bottom-up pop-out effects, multiplicative modulation of neuronal tuning curves and shift in contrast responses- emerge naturally as predictions of the model. We also show that the bayesian model predicts well human eye fixations (considered as a proxy

Pedestrian detectability: Predicting human perception performance with machine vision

by David Engel - in Proc. 2011 IEEE Intelligent Vehicles Symposium , 2007
"... Abstract—How likely is it that a driver notices a person standing on the side of the road? In this paper we introduce the concept of pedestrian detectability. It is a measure of how probable it is that a human observer perceives pedestrians in an image. We acquire a dataset of pedestrians with their ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Abstract—How likely is it that a driver notices a person standing on the side of the road? In this paper we introduce the concept of pedestrian detectability. It is a measure of how probable it is that a human observer perceives pedestrians in an image. We acquire a dataset of pedestrians with their associated detectabilities in a rapid detection experiment using images of street scenes. On this dataset we learn a regression function that allows us to predict human detectabilities from an optimized set of image and contextual features. We exploit this function to infer the optimal focus of attention for pedestrian detection. With this combination of human perception and machine vision we propose a method we deem useful for the optimization of Human-Machine-Interfaces in driver assistance systems. I.
(Show Context)

Citation Context

...ith their associated detectabilities D(Pedestrian)’s. To ensure validity in a relevant setting, we chose a dataset that contains labeled pedestrians in a natural setting. The MIT StreetScenes dataset =-=[6]-=- is well suited for the task since it contains labeled pedestrians in a wide variety of poses and contexts as well as dense labels for other object classes such as cars or sidewalks which might influe...

Probabilistic combination of visual context based attention and object detection

by Christian Wojek, Bernt Schiele, Ales ̌ Leonardis - In WAPCV , 2008
"... Abstract. Visual context provides cues about an object’s presence, po-sition and size within the observed scene, which are used to increase the performance of object detection techniques. However, state-of-the-art methods for context aware object detection could decrease the initial performance. We ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract. Visual context provides cues about an object’s presence, po-sition and size within the observed scene, which are used to increase the performance of object detection techniques. However, state-of-the-art methods for context aware object detection could decrease the initial performance. We discuss the reasons for failure and propose a concept that overcomes these limitations. Therefore, we introduce the prior prob-ability function of an object detector, that maps the detector’s output to probabilities. Together, with an appropriate contextual weighting a probabilistic framework is established. In addition, we present an exten-sion to state-of-the-art methods to learn scale-dependent visual context information and show how this increases the initial performance. The standard methods and our proposed extensions are compared on a novel demanding image data set. 1
(Show Context)

Citation Context

...tion and scale within the observed scene or image. This additional information is typically ignored in the object detection task. Like in other promising papers on visual context for object detection =-=[9,7,10,8]-=-, we define the context as the surrounding, or background, of the current object of interest. This context is used to focus the attention on regions in the image where the objects are likely to occur....

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University