• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

T (1999a) Hierarchical models of object recognition in cortex. Nat Neurosci (0)

by M Riesenhuber, Poggio
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 836
Next 10 →

Robust object recognition with cortex-like mechanisms

by Thomas Serre, Lior Wolf, Stanley Bileschi, Maximilian Riesenhuber, Tomaso Poggio - IEEE Trans. Pattern Analysis and Machine Intelligence , 2007
"... Abstract—We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating b ..."
Abstract - Cited by 389 (47 self) - Add to MetaCart
Abstract—We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex.
(Show Context)

Citation Context

... neuroscience [4]–[10] have been tested on at least some natural images, neurobiological models of object recognition in cortex have not yet been extended to deal with real-world image databases [11]–=-=[14]-=-. We present a system that is based on a quantitative theory of the ventral stream of visual cortex [14], [15]. A key element Manuscript received January XX, 200X; revised August XX, 200X. T. Serre, L...

Object recognition with features inspired by visual cortex

by Thomas Serre, Lior Wolf, Tomaso Poggio - CVPR’05 -Volume , 2005
"... We introduce a novel set of features for robust object recognition. Each element of this set is a complex feature obtained by combining position- and scale-tolerant edgedetectors over neighboring positions and multiple orientations. Our system’s architecture is motivated by a quantitative model of v ..."
Abstract - Cited by 291 (17 self) - Add to MetaCart
We introduce a novel set of features for robust object recognition. Each element of this set is a complex feature obtained by combining position- and scale-tolerant edgedetectors over neighboring positions and multiple orientations. Our system’s architecture is motivated by a quantitative model of visual cortex. We show that our approach exhibits excellent recognition performance and outperforms several state-of-the-art systems on a variety of image datasets including many different object categories. We also demonstrate that our system is able to learn from very few examples. The performance of the approach constitutes a suggestive plausibility proof for a class of feedforward models of object recognition in cortex. 1
(Show Context)

Citation Context

...as always been inspired and challenged by human vision, it seems to never have advanced past the first stage of processing in the simple cells of primary visual cortex V1. Models of biological vision =-=[5, 13, 16, 1]-=- have not been extended to deal with real-world object recognition tasks (e.g., large scale natural image databases) while computer vision systems that are closer to biology like LeNet [10] are still ...

Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search

by Antonio Torralba, Aude Oliva, Monica S. Castelhano, John M. Henderson - PSYCHOLOGICAL REVIEW , 2006
"... Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an or ..."
Abstract - Cited by 258 (17 self) - Add to MetaCart
Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an original approach of attentional guidance by global scene context. The model comprises 2 parallel pathways; one pathway computes local features (saliency) and the other computes global (scenecentered) features. The contextual guidance model of attention combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.
(Show Context)

Citation Context

...how how a model of the target object influences the allocation of attention. In large part, however, identifying the relevant features of object categories in real-world scenes remains an open issue (=-=Riesenhuber & Poggio, 1999-=-; Torralba, Murphy & 2004b; Ullman et al., 2002). Our claim in this paper is that when the target is very small (the people and the mugs occupy a region that has a size of 1% the size of the image on ...

Building the gist of a scene: the role of global image features in recognition

by Aude Oliva, Antonio Torralba , 2006
"... ..."
Abstract - Cited by 250 (10 self) - Add to MetaCart
Abstract not found

A biologically inspired system for action recognition

by H. Jhuang, T. Serre, L. Wolf, T. Poggio - In ICCV , 2007
"... We present a biologically-motivated system for the recognition of actions from video sequences. The approach builds on recent work on object recognition based on hierarchical feedforward architectures [25, 16, 20] and extends a neurobiological model of motion processing in the visual cortex [10]. Th ..."
Abstract - Cited by 238 (15 self) - Add to MetaCart
We present a biologically-motivated system for the recognition of actions from video sequences. The approach builds on recent work on object recognition based on hierarchical feedforward architectures [25, 16, 20] and extends a neurobiological model of motion processing in the visual cortex [10]. The system consists of a hierarchy of spatio-temporal feature detectors of increasing complexity: an input sequence is first analyzed by an array of motiondirection sensitive units which, through a hierarchy of processing stages, lead to position-invariant spatio-temporal feature detectors. We experiment with different types of motion-direction sensitive units as well as different system architectures. As in [16], we find that sparse features in intermediate stages outperform dense ones and that using a simple feature selection approach leads to an efficient system that performs better with far fewer features. We test the approach on different publicly available action datasets, in all cases achieving the highest results reported to date. 1.
(Show Context)

Citation Context

...the work of Fukushima [8] and LeCun et al. [12]. Here we follow the more recent framework using scale and position invariant C2 features [25, 16] that originated with the work of Riesenhuber & Poggio =-=[21]-=-. C2 shape features In previous work [25, 16], a still grayvalue input image is first analyzed by an array of Gabor filters (S1 units) at multiple orientations for all positions and scales. Processing...

Multiclass object recognition with sparse, localized features

by Jim Mutch, David G. Lowe - IN: CVPR , 2006
"... We apply a biologically inspired model of visual object recognition to the multiclass object categorization problem. Our model modifies that of Serre, Wolf, and Poggio. As in that work, we first apply Gabor filters at all positions and scales; feature complexity and position/scale invariance are the ..."
Abstract - Cited by 196 (6 self) - Add to MetaCart
We apply a biologically inspired model of visual object recognition to the multiclass object categorization problem. Our model modifies that of Serre, Wolf, and Poggio. As in that work, we first apply Gabor filters at all positions and scales; feature complexity and position/scale invariance are then built up by alternating template matching and max pooling operations. We refine the approach in several bi-ologically plausible ways, using simple versions of sparsi-fication and lateral inhibition. We demonstrate the value of retaining some position and scale information above the in-termediate feature level. Using feature selection we arrive at a model that performs better with fewer features. Our final model is tested on the Caltech 101 object categories and the UIUC car localization task, in both cases achieving state-of-the-art performance. The results strengthen the case for using this class of model in computer vision.
(Show Context)

Citation Context

...nts on feature locations [6, 3], others ignore geometry and use a “bag of features” approach that ignores the locations of individual features [4]. According to models of object recognition in cortex =-=[21]-=-, the brain uses a hierarchical approach, in which simple, low-level features having high position and scale specificity are pooled and combined into more complex, higher-level features having greater...

Building high-level features using large scale unsupervised learning

by Quoc V. Le, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng - In International Conference on Machine Learning, 2012. 103
"... We consider the problem of building highlevel, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder withpoolingandlocalcontrastnormalizati ..."
Abstract - Cited by 180 (9 self) - Add to MetaCart
We consider the problem of building highlevel, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder withpoolingandlocalcontrastnormalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200x200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containingafaceornot. Controlexperiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8 % accuracy in recognizing 20,000 object categories from ImageNet, a leap of 70 % relative improvement over the previous state-of-the-art.
(Show Context)

Citation Context

... pooling. Our style of stacking a series of uniform modules, switching between selectivity and tolerance layers, is reminiscent of Neocognition and HMAX (Fukushima & Miyake, 1982; LeCun et al., 1998; =-=Riesenhuber & Poggio, 1999-=-). It has also been argued to be an architecture employed by the brain (DiCarlo et al., 2012). Although we use local receptive fields, they are not convolutional: the parameters are not shared across ...

Representation learning: A review and new perspectives.

by Yoshua Bengio , Aaron Courville , Pascal Vincent - of IEEE Conf. Comp. Vision Pattern Recog. (CVPR), , 2005
"... Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract - Cited by 173 (4 self) - Add to MetaCart
Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

Accurate visual memory for previously attended objects in natural scenes

by Andrew Hollingworth, John M. Henderson , 2002
"... The nature of the information retained from previously fixated (and hence attended) objects in natural scenes was investigated. In a saccade-contingent change paradigm, participants successfully detected type and token changes (Experiment 1) or token and rotation changes (Experiment 2) to a target o ..."
Abstract - Cited by 159 (30 self) - Add to MetaCart
The nature of the information retained from previously fixated (and hence attended) objects in natural scenes was investigated. In a saccade-contingent change paradigm, participants successfully detected type and token changes (Experiment 1) or token and rotation changes (Experiment 2) to a target object when the object had been previously attended but was no longer within the focus of attention when the change occurred. In addition, participants demonstrated accurate type-, token-, and orientation-discrimination performance on subsequent long-term memory tests (Experiments 1 and 2) and during online perceptual processing of a scene (Experiment 3). These data suggest that relatively detailed visual information is retained in memory from previously attended objects in natural scenes. A model of scene perception and long-term memory is proposed.
(Show Context)

Citation Context

...se criteria have been proposed in the object recognition literature, including viewpoint-dependent structural descriptions (Bülthoff, Edelman, & Tarr, 1995) and abstract 2-D-feature representations (=-=Riesenhuber & Poggio, 1999-=-). It is then a possibility that such higher-level visual representations are retained from previously fixated and attended regions and accumulate within a representation of the scene.1 Research using...

Action snippets: How many frames does human action recognition require

by Konrad Schindler, Luc Van Gool - In CVPR , 2008
"... Visual recognition of human actions in video clips has been an active field of research in recent years. However, most published methods either analyse an entire video and assign it a single action label, or use relatively large lookahead to classify each frame. Contrary to these strategies, human v ..."
Abstract - Cited by 156 (2 self) - Add to MetaCart
Visual recognition of human actions in video clips has been an active field of research in recent years. However, most published methods either analyse an entire video and assign it a single action label, or use relatively large lookahead to classify each frame. Contrary to these strategies, human vision proves that simple actions can be recognised almost instantaneously. In this paper, we present a system for action recognition from very short sequences (“snippets”) of 1–10 frames, and systematically evaluate it on standard data sets. It turns out that even local shape and optic flow for a single frame are enough to achieve ≈ 90% correct recognitions, and snippets of 5-7 frames (0.3-0.5 seconds of video) are enough to achieve a performance similar to the one obtainable with the entire video sequence. 1.
(Show Context)

Citation Context

...bustness to translations and obtain a more compact representation, each orientation map is downsampled by MAX-pooling (sometimes called “winner-takesall”). This operation was originally introduced in =-=[14, 23]-=-, and has been shown to yield better translation invariance and feature preservation than linear averaging [14, 25]. The response at location (x, y) is given by h(x, y) = max (i,j)∈G(x,y) [ g(i, j) ] ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University