Results 1 - 10
of
426
Gradient-based learning applied to document recognition
- Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract
-
Cited by 487 (38 self)
- Add to MetaCart
Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2-D) shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
Relations between the statistics of natural images and the response properties of cortical cells
- J. Opt. Soc. Am. A
, 1987
"... The relative efficiency of any particular image-coding scheme should be defined only in relation to the class of images that the code is likely to encounter. To understand the representation of images by the mammalian visual system, it might therefore be useful to consider the statistics of images f ..."
Abstract
-
Cited by 475 (12 self)
- Add to MetaCart
The relative efficiency of any particular image-coding scheme should be defined only in relation to the class of images that the code is likely to encounter. To understand the representation of images by the mammalian visual system, it might therefore be useful to consider the statistics of images from the natural environment (i.e., images with trees, rocks, bushes, etc). In this study, various coding schemes are compared in relation to how they represent the information in such natural images. The coefficients of such codes are represented by arrays of mechanisms that respond to local regions of space, spatial frequency, and orientation (Gabor-like transforms). For many classes of image, such codes will not be an efficient means of representing information. However, the results obtained with six natural images suggest that the orientation and the spatial-frequency tuning of mammalian simple cells are well suited for coding the information in such images if the goal of the code is to convert higher-order redundancy (e.g., correlation between the intensities of neighboring pixels) into first-order redundancy (i.e., the response distribution of the coefficients). Such coding produces a relatively high signal-to-noise ratio and permits information to be transmitted with only a subset of the total number of cells. These results support Barlow's theory that the goal of natural vision is to represent the information in the natural environment with minimal redundancy.
Parallel Networks that Learn to Pronounce English Text
- COMPLEX SYSTEMS
, 1987
"... This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed h ..."
Abstract
-
Cited by 413 (5 self)
- Add to MetaCart
This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed human performance. (i) The learning follows a power law. (;i) The more words the network learns, the better it is at generalizing and correctly pronouncing new words, (iii) The performance of the network degrades very slowly as connections in the network are damaged: no single link or processing unit is essential. (iv) Relearning after damage is much faster than learning during the original training. (v) Distributed or spaced practice is more effective for long-term retention than massed practice. Network models can be constructed that have the same performance and learning characteristics on a particular task, but differ completely at the levels of synaptic strengths and single-unit responses. However, hierarchical clustering techniques applied to NETtalk reveal that these different networks have similar internal representations of letter-to-sound correspondences within groups of processing units. This suggests that invariant internal representations may be found in assemblies of neurons intermediate in size between highly localized and completely distributed representations.
Hierarchical Models of Object Recognition in Cortex
, 1999
"... The classical model of visual processing in cortex is a hierarchy of increasingly sophisticated representations, extending in a natural way the model of simple to complex cells of Hubel and Wiesel. Somewhat surprisingly, little quantitative modeling has been done in the last 15 years to explore th ..."
Abstract
-
Cited by 344 (67 self)
- Add to MetaCart
The classical model of visual processing in cortex is a hierarchy of increasingly sophisticated representations, extending in a natural way the model of simple to complex cells of Hubel and Wiesel. Somewhat surprisingly, little quantitative modeling has been done in the last 15 years to explore the biological feasibility of this class of models to explain higher level visual processing, such as object recognition. We describe a new hierarchical model that accounts well for this complex visual task, is consistent with several recent physiological experiments in inferotemporal cortex and makes testable predictions. The model is based on a novel MAX-like operation on the inputs to certain cortical neurons which may have a general role in cortical function.
Learning Invariance From Transformation Sequences
, 1991
"... Introduction How can we consistently recognize objects when changes in the viewing angle, eye position, distance, size, orientation, relative position, or deformations of the object itself (e.g., of a newspaper or a gymnast) can change their retinal projections so significantly? The visual system m ..."
Abstract
-
Cited by 179 (2 self)
- Add to MetaCart
Introduction How can we consistently recognize objects when changes in the viewing angle, eye position, distance, size, orientation, relative position, or deformations of the object itself (e.g., of a newspaper or a gymnast) can change their retinal projections so significantly? The visual system must contain knowledge about such transformations in order to be able to generalize correctly. Part of this knowledge is probably determined genetically, but it is also likely that the visual system learns from its sensory experience, which contains plenty of examples of such transformations. Electrophysiological experiments suggest that the invariance properties of perception may be due to the receptive field characteristics of individual cells in the visual system. Complex cells in the primary visual cortex exhibit approximate invariance to position within a limited range (Hubel and Wiesel 1962), while cells in higher visual areas in the temporal cortex show more complex forms of invariance
A Theory of Cerebellar Function
, 1971
"... A comprehensive theory of cerebellar function is presented, which ties together the known anatomy and physiology of the cerebellum into a pattern-recognition data processing system. The cerebellum is postulated to be functionally and structurally equivalent to a modification of the classical Percept ..."
Abstract
-
Cited by 130 (5 self)
- Add to MetaCart
A comprehensive theory of cerebellar function is presented, which ties together the known anatomy and physiology of the cerebellum into a pattern-recognition data processing system. The cerebellum is postulated to be functionally and structurally equivalent to a modification of the classical Perceptron pattern-classification device. It is suggested that the mossy fiber - granule cell - Golgi cell input network performs an expansion recoding that enhances the pattern-discrimination capacity and learning speed of the cerebellar Purkinje response cells.
Face Recognition: A Convolutional Neural Network Approach
- IEEE Transactions on Neural Networks
, 1997
"... Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult [43]. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map n ..."
Abstract
-
Cited by 127 (0 self)
- Add to MetaCart
Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult [43]. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map neural network, and a convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loeve transform in place of the self-organizing map, and a multi-layer perceptron in place of the convolutional netwo...
Robust object recognition with cortex-like mechanisms
- IEEE Trans. Pattern Analysis and Machine Intelligence
, 2007
"... Abstract—We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating b ..."
Abstract
-
Cited by 118 (20 self)
- Add to MetaCart
Abstract—We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex.
Deformable Kernels for Early Vision
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1991
"... Early vision algorithms often have a first stage of linear-filtering that `extracts' from the image information at multiple scales of resolution and multiple orientations. A common difficulty in the design and implementation of such schemes is that one feels compelled to discretize coarsely the spac ..."
Abstract
-
Cited by 112 (8 self)
- Add to MetaCart
Early vision algorithms often have a first stage of linear-filtering that `extracts' from the image information at multiple scales of resolution and multiple orientations. A common difficulty in the design and implementation of such schemes is that one feels compelled to discretize coarsely the space of scales and orientations in order to reduce computation and storage costs. This discretization produces anisotropies due to a loss of traslation-, rotation-, scaling-invariance that makes early vision algorithms less precise and more difficult to design. This need not be so: one can compute and store efficiently the response of families of linear filters defined on a continuum of orientations and scales. A technique is presented that allows (1) to compute the best approximation of a given family using linear combinations of a small number of `basis' functions; (2) to describe all finite-dimensional families, i.e. the families of filters for which a finite dimensional representation is p...

