Results 1 - 10
of
71
Gradient-based learning applied to document recognition
- Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract
-
Cited by 487 (38 self)
- Add to MetaCart
Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2-D) shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning
, 1994
"... Genetic algorithms (GAs) are biologically motivated adaptive systems which have been used, with varying degrees of success, for function optimization. In this study, an abstraction of the basic genetic algorithm, the Equilibrium Genetic Algorithm (EGA), and the GA in turn, are reconsidered within th ..."
Abstract
-
Cited by 257 (11 self)
- Add to MetaCart
Genetic algorithms (GAs) are biologically motivated adaptive systems which have been used, with varying degrees of success, for function optimization. In this study, an abstraction of the basic genetic algorithm, the Equilibrium Genetic Algorithm (EGA), and the GA in turn, are reconsidered within the framework of competitive learning. This new perspective reveals a number of different possibilities for performance improvements. This paper explores population-based incremental learning (PBIL), a method of combining the mechanisms of a generational genetic algorithm with simple competitive learning. The combination of these two methods reveals a tool which is far simpler than a GA, and which out-performs a GA on large set of optimization problems in terms of both speed and accuracy. This paper presents an empirical analysis of where the proposed technique will outperform genetic algorithms, and describes a class of problems in which a genetic algorithm may be able to perform better. Extensions to this algorithm are discussed and analyzed. PBIL and extensions are compared with a standard GA on twelve problems, including standard numerical optimization functions, traditional GA test suite problems, and NP-Complete problems.
Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems
- Proceedings of the IEEE
, 1998
"... this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, ph ..."
Abstract
-
Cited by 193 (4 self)
- Add to MetaCart
this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, physics, biology, control and signal processing, information theory, complexity theory, and psychology (see [45]). Neural networks have provided a fertile soil for the infusion (and occasionally confusion) of ideas, as well as a meeting ground for comparing viewpoints, sharing tools, and renovating approaches. It is within the ill-defined boundaries of the field of neural networks that researchers in traditionally distant fields have come to the realization that they have been attacking fundamentally similar optimization problems.
A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms
- ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN's performance is highly sensitive to the definition of its distance function. Many k-NN v ..."
Abstract
-
Cited by 94 (0 self)
- Add to MetaCart
Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN's performance is highly sensitive to the definition of its distance function. Many k-NN variants have been proposed to reduce this sensitivity by parameterizing the distance function with feature weights. However, these variants have not been categorized nor empirically compared. This paper reviews a class of weight-setting methods for lazy learning algorithms. We introduce a framework for distinguishing these methods and empirically compare them. We observed four trends from our experiments and conducted further studies to highlight them. Our results suggest that methods which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less pre-processing, perform better in the presence of interacting features, and generally require less training data to learn good settings. We also found that continuous weighting methods tend to outperform feature selection algorithms for tasks where some features are useful but less important than others.
Image classification by a two dimensional hidden Markov model
- IEEE Trans. Signal Processing
, 2000
"... For block-based classification, an image is divided into blocks and a feature vector is formed for each block by grouping statistics extracted from the block. Conventional block-based classification algorithms decide the class of a block by examining only the feature vector of this block and ignorin ..."
Abstract
-
Cited by 52 (6 self)
- Add to MetaCart
For block-based classification, an image is divided into blocks and a feature vector is formed for each block by grouping statistics extracted from the block. Conventional block-based classification algorithms decide the class of a block by examining only the feature vector of this block and ignoring context information. In order to improve classi cation by context, an algorithm is proposed, which models images by two dimensional hidden Markov models (HMMs). The HMM considers feature vectors statistically dependent through an underlying state process assumed to be a Markov mesh, which has transition probabilities conditioned on the states of neighboring blocks from both horizontal and vertical directions. Thus, the dependency in two dimensions is reflected simultaneously. The HMM parameters are estimated by the EM algorithm. To classify an image, the classes with maximum a posteriori probability are searched jointly for all the blocks. Applications of the HMM algorithm to document and aerial image segmentation show that the algorithm outperforms CART TM,LVQ, and Bayes VQ.
Recognizing Teleoperated Manipulations
- Proceedings of the IEEE International Conference on Robotics and Automation
, 1993
"... The many degrees-of-freedom and distributed sensing capability of dextrous robot hands permits the use of control programs that rely on qualitative changes in sensor feedback rather than precise positioning and force information. One way of designing such a control program is to have the robot learn ..."
Abstract
-
Cited by 46 (0 self)
- Add to MetaCart
The many degrees-of-freedom and distributed sensing capability of dextrous robot hands permits the use of control programs that rely on qualitative changes in sensor feedback rather than precise positioning and force information. One way of designing such a control program is to have the robot learn the qualitative control characteristics from examples. A convenient way of providing these examples is via teleoperation. To this end, this paper presents results for recognizing and segmenting manipulation primitives from a teleoperated task by analysis of features in sensor feedback. k-nearest quantized pattern vectors determine potential classifications. A hidden Markov model provides task context for the final segmentation. The illustrative task is picking up a plastic egg with a spatula. 1 Introduction Dextrous hands can be controlled qualitatively using strategies that servo on significant changes in sensor feedback. Using the many degrees of freedom and distributed sensing capabili...
Connectionist Probability Estimation in HMM Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract
-
Cited by 45 (9 self)
- Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
Modular Neural Networks for Learning Context-Dependent Game Strategies
- Master’s thesis, Computer Speech and Language Processing
, 1992
"... The method of temporal differences (TD) is a learning technique which specialises in predicting the likely outcome of a sequence over time. Examples of such sequences include speech frame vectors, whose outcome is a phoneme or word decision, and positions in a board game, whose outcome is a win/loss ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
The method of temporal differences (TD) is a learning technique which specialises in predicting the likely outcome of a sequence over time. Examples of such sequences include speech frame vectors, whose outcome is a phoneme or word decision, and positions in a board game, whose outcome is a win/loss decision. Recent results by Tesauro in the domain of backgammon indicate that a neural network, trained by TD methods to evaluate positions generated by self-play, can reach an advanced level of backgammon skill. For my summer thesis project, I first implemented the TD/neural network learning algorithms and confirmed Tesauro's results, using the domains of tic-tac-toe and backgammon. Then, motivated by Waibel's success with modular neural networks for phoneme recognition, I experimented with using two modular architectures (DDD and Meta-Pi) in place of the monolithic networks. I found that using the modular networks significantly enhanced the ability of the backgammon evaluator to change it...
Bayes Risk Weighted Vector Quantization With Posterior Estimation for Image Compression and Classification
- IEEE Transactions on Image Processing
, 1996
"... Classification and compression play important roles in communicating digital information. Their combination is useful in many applications, including the detection of abnormalities in compressed medical images. In view of the similarities of compression and low-level classification, it is not surpri ..."
Abstract
-
Cited by 29 (9 self)
- Add to MetaCart
Classification and compression play important roles in communicating digital information. Their combination is useful in many applications, including the detection of abnormalities in compressed medical images. In view of the similarities of compression and low-level classification, it is not surprising that there are many similar methods for their design. Because some of these methods are useful for designing vector quantizers, it seems natural that vector quantization (VQ) is explored for the combined goal. We investigate several VQ-based algorithms that seek to minimize both the distortion of compressed images and errors in classifying their pixel blocks. These algorithms are investigated with both full search and tree-structured codes. We emphasize a nonparametric technique that minimizes both error measures simultaneously by incorporating a Bayes risk component into the distortion measure used for design and encoding. We introduce a tree-structured posterior estimator to produce t...
Performance evaluation of pattern classifiers for handwritten character recognition
- International Journal on Document Analysis and Recognition
, 2002
"... Abstract. This paper describes a performance evaluation study in which some efficient classifiers are tested in handwritten digit recognition. The evaluated classifiers include a statistical classifier (modified quadratic discriminant function, MQDF), three neural classifiers, and an LVQ (learning v ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
Abstract. This paper describes a performance evaluation study in which some efficient classifiers are tested in handwritten digit recognition. The evaluated classifiers include a statistical classifier (modified quadratic discriminant function, MQDF), three neural classifiers, and an LVQ (learning vector quantization) classifier. They are efficient in that high accuracies can be achieved at moderate memory space and computation cost. The performance is measured in terms of classification accuracy, sensitivity to training sample size, ambiguity rejection, and outlier resistance. The outlier resistance of neural classifiers is enhanced by training with synthesized outlier data. The classifiers are tested on a large data set extracted from NIST SD19. As results, the test accuracies of the evaluated classifiers are comparable to or higher than those of the nearest neighbor (1-NN) rule and regularized discriminant analysis (RDA). It is shown that neural classifiers are more susceptible to small sample size than MQDF, although they yield higher accuracies on large sample size. As a neural classifier, the polynomial classifier (PC) gives the highest accuracy and performs best in ambiguity rejection. On the other hand, MQDF is superior in outlier rejection even though it is not trained with outlier data. The results indicate that pattern classifiers have complementary advantages and they should be appropriately combined to achieve higher performance.

