• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

What is the Best Multi-Stage Architecture for Object Recognition?

Cached

  • Download as a PDF

Download Links

  • [yann.lecun.com]
  • [people.ee.duke.edu]
  • [people.ee.duke.edu]
  • [www.cs.toronto.edu]
  • [www.cs.toronto.edu]
  • [cs.nyu.edu]
  • [koray.kavukcuoglu.org]
  • [cs.nyu.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Kevin Jarrett , Koray Kavukcuoglu , Yann Lecun
Citations:252 - 22 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Jarrett_whatis,
    author = {Kevin Jarrett and Koray Kavukcuoglu and Yann Lecun},
    title = {What is the Best Multi-Stage Architecture for Object Recognition?},
    year = {}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a non-linear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hard-wired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode. This paper addresses three questions: 1. How does the non-linearities that follow the filter banks influence the recognition accuracy? 2. does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3. Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks. We show that two stages of feature extraction yield better accuracy than one. Most surprisingly, we show that a two-stage system with random filters can yield almost 63 % recognition rate on Caltech-101, provided that the proper non-linearities and pooling layers are used. Finally, we show that with supervised refinement, the system achieves state-of-the-art performance on NORB dataset (5.6%) and unsupervised pre-training followed by supervised refinement produces good accuracy on Caltech-101 (> 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (0.53%). 1.

Keyphrases

multi-stage architecture    filter bank    object recognition    feature extraction    good accuracy    random filter    supervised refinement    two-stage system    unsupervised pre-training    important ingredient    norb dataset    recognition rate    object recognition benchmark    local contrast normalization    feature extraction stage    many recent object recognition system    unsupervised mode    known error rate    supervised manner    feature extraction yield    state-of-the-art performance    unprocessed mnist dataset    proper non-linearities    non-linear transformation    recognition accuracy   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University