Results 1 - 10
of
32
Solving multiclass learning problems via error-correcting output codes
- Journal of Artificial Intelligence Research
, 1995
"... Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning ..."
Abstract
-
Cited by 448 (9 self)
- Add to MetaCart
Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of over tting avoidance techniques such as decision-tree pruning. Finally,we show that|like the other methods|the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems. 1.
Evolving Networks: Using the Genetic Algorithm with Connectionist Learning
- In
, 1990
"... It is appealing to consider hybrids of neural-network learning algorithms with evolutionary search procedures, simply because Nature has so successfully done so. In fact, computational models of learning and evolution offer theoretical biology new tools for addressing questions about Nature that hav ..."
Abstract
-
Cited by 171 (2 self)
- Add to MetaCart
It is appealing to consider hybrids of neural-network learning algorithms with evolutionary search procedures, simply because Nature has so successfully done so. In fact, computational models of learning and evolution offer theoretical biology new tools for addressing questions about Nature that have dogged that field since Darwin [Belew, 1990]. The concern of this paper, however, is strictly artificial: Can hybrids of connectionist learning algorithms and genetic algorithms produce more efficient and effective algorithms than either technique applied in isolation? The paper begins with a survey of recent work (by us and others) that combines Holland's Genetic Algorithm (GA) with connectionist techniques and delineates some of the basic design problems these hybrids share. This analysis suggests the dangers of overly literal representations of the network on the genome (e.g., encoding each weight explicitly). A preliminary set of experiments that use the GA to find unusual but successf...
Knowledge-Based Artificial Neural Networks
, 1994
"... Hybrid learning methods use theoretical knowledge of a domain and a set of classified examples to develop a method for accurately classifying examples not seen during training. The challenge of hybrid learning systems is to use the information provided by one source of information to offset informat ..."
Abstract
-
Cited by 133 (13 self)
- Add to MetaCart
Hybrid learning methods use theoretical knowledge of a domain and a set of classified examples to develop a method for accurately classifying examples not seen during training. The challenge of hybrid learning systems is to use the information provided by one source of information to offset information missing from the other source. By so doing, a hybrid learning system should learn more effectively than systems that use only one of the information sources. KBANN(Knowledge-Based Artificial Neural Networks) is a hybrid learning system built on top of connectionist learning techniques. It maps problem-specific "domain theories", represented in propositional logic, into neural networks and then refines this reformulated knowledge using backpropagation. KBANN is evaluated by extensive empirical tests on two problems from molecular biology. Among other results, these tests show that the networks created by KBANN generalize better than a wide variety of learning systems, as well as several t...
First and Second-Order Methods for Learning: between Steepest Descent and Newton's Method
- Neural Computation
, 1992
"... On-line first order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first- and second-order optimization methods for learning in feedforward neura ..."
Abstract
-
Cited by 108 (6 self)
- Add to MetaCart
On-line first order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first- and second-order optimization methods for learning in feedforward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimization techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations.
Symbolic and neural learning algorithms: an experimental comparison
- Machine Learning
, 1991
"... Abstract Despite the fact that many symbolic and neural network (connectionist) learning algorithms address the same problem of learning from classified examples, very little is known regarding their comparative strengths and weaknesses. Experiments comparing the ID3 symbolic learning algorithm with ..."
Abstract
-
Cited by 95 (7 self)
- Add to MetaCart
Abstract Despite the fact that many symbolic and neural network (connectionist) learning algorithms address the same problem of learning from classified examples, very little is known regarding their comparative strengths and weaknesses. Experiments comparing the ID3 symbolic learning algorithm with the perception and backpropagation neural learning algorithms have been performed using five large, real-world data sets. Overall, backpropagation performs slightly better than the other two algorithms in terms of classification accuracy on new examples, but takes much longer to train. Experimental results suggest that backpropagation can work significantly better on data sets containing numerical data. Also analyzed empirically are the effects of (1) the amount of training data, (2) imperfect training examples, and (3) the encoding of the desired outputs. Backpropagation occasionally outperforms the other two systems when given relatively small amounts of training data. It is slightly more accurate than ID3 when examples are noisy or incompletely specified. Finally, backpropagation more effectively utilizes a "distributed " output encoding.
Efficient Back Prop
, 1996
"... HINE Parameters X0, X1, ....Xp Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp Input w w0 w1 AT&T Laboratories (c) COST FUNCTION Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp X0, X1, ....Xp Input Parameters w B R A COMPUTING THE GRADIENT WITH BACKPROPAGATIO ..."
Abstract
-
Cited by 93 (16 self)
- Add to MetaCart
HINE Parameters X0, X1, ....Xp Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp Input w w0 w1 AT&T Laboratories (c) COST FUNCTION Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp X0, X1, ....Xp Input Parameters w B R A COMPUTING THE GRADIENT WITH BACKPROPAGATION O = A(I1, I2) dI1 = dO ¶ A ¶ I1 dI2 = dO ¶ A ¶ I2 - The learning machine is composed of modules (e.g. layers) - Each module can do two things: 1- compute its outputs from its inputs (FPROP) 2- compute gradient vectors at its inputs from gradient vectors at its outputs (BPROP) A O, dO I1, dI1 I2, dI2 AT&T Laboratories (c) AN INTERESTING SPECIAL CASE: MULTILAYER NETWORKS X0, X1, ....Xp Output Desired Output D0, D1,...Dp Y0, Y1,...Yp Input || D - Y || 2 2 1 WX F() WX F() Mean Square Error Parameters (weights + biases) w Weight matrix E0, E1,....Ep Sigmoids + Biase
Error-Correcting Output Codes: A General Method for Improving Multiclass Inductive Learning Programs
- IN PROCEEDINGS OF AAAI-91
, 1991
"... Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k ? 2 values (i.e., k "classes"). The definition is acquired by studying large collections of training examples of the form hx i ; f(x i )i. Existing approaches to this pro ..."
Abstract
-
Cited by 81 (7 self)
- Add to MetaCart
Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k ? 2 values (i.e., k "classes"). The definition is acquired by studying large collections of training examples of the form hx i ; f(x i )i. Existing approaches to this problem include (a) direct application of multiclass algorithms such as the decision-tree algorithms ID3 and CART, (b) application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and (c) application of binary concept learning algorithms with distributed output codes such as those employed by Sejnowski and Rosenberg in the NETtalk system. This paper compares these three approaches to a new technique in which BCH error-correcting codes are employed as a distributed output representation. We show that these output representations improve the performance of ID3 on the NETtalk task and of backpropagation on an isolated-letter speech-recognition t...
Adaptive Load Migration Systems for PVM
, 1994
"... Adaptive load distribution is necessary for parallel applications to co-exist effectively with other jobs in a network of shared heterogeneous workstations. We present three methods that provide such support for PVM applications. Two of these methods, MPVM and UPVM, adapt to changes in the workstati ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
Adaptive load distribution is necessary for parallel applications to co-exist effectively with other jobs in a network of shared heterogeneous workstations. We present three methods that provide such support for PVM applications. Two of these methods, MPVM and UPVM, adapt to changes in the workstation environment by transparently migrating the virtual processors (VPs) of the parallel application. A VP in MPVM is a Unix process, while UPVM defines light-weight, process-like VPs. The third method, ADM, is a programming methodology for writing programs that perform adaptive load distribution through data movement. These methods are discussed and compared in terms of effectiveness, usability, and performance. Adaptive Load Migration Systems for PVM 2 of 23 1.0 Introduction Message-passing systems such as PVM [1-3] allow a heterogeneous network of parallel and serial computers to be programmed as a single computational resource. This resource appears to the application programmer as a d...
Direct Transfer of Learned Information Among Neural Networks
- Proceedings of AAAI-91
, 1991
"... A touted advantage of symbolic representations is the ease of transferring learned information from one intelligent agent to another. This paper investigates an analogous problem: how to use information from one neural network to help a second network learn a related task. Rather than translate such ..."
Abstract
-
Cited by 45 (5 self)
- Add to MetaCart
A touted advantage of symbolic representations is the ease of transferring learned information from one intelligent agent to another. This paper investigates an analogous problem: how to use information from one neural network to help a second network learn a related task. Rather than translate such information into symbolic form (in which it may not be readily expressible), we investigate the direct transfer of information encoded as weights. Here, we focus on how transfer can be used to address the important problem of improving neural network learning speed. First we present an exploratory study of the somewhat surprising effects of pre-setting network weights on subsequent learning. Guided by hypotheses from this study, we sped up back-propagation learning for two speech recognition tasks. By transferring weights from smaller networks trained on subtasks, we achieved speedups of up to an order of magnitude compared with training starting with random weights, ...
Improving the Performance of Radial Basis Function Networks by Learning Center Locations
- In
, 1992
"... Three methods for improving the performance of (gaussian) radial basis function (RBF) networks were tested on the NETtalk task. In RBF, a new example is classified by computing its Euclidean distance to a set of centers chosen by unsupervised methods. The application of supervised learning to learn ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
Three methods for improving the performance of (gaussian) radial basis function (RBF) networks were tested on the NETtalk task. In RBF, a new example is classified by computing its Euclidean distance to a set of centers chosen by unsupervised methods. The application of supervised learning to learn a non-Euclidean distance metric was found to reduce the error rate of RBF networks, while supervised learning of each center's variance resulted in inferior performance. The best improvement in accuracy was achieved by networks called generalized radial basis function (GRBF) networks. In GRBF, the center locations are determined by supervised learning. After training on 1000 words, RBF classifies 56.5% of letters correct, while GRBF scores 73.4% letters correct (on a separate test set). From these and other experiments, we conclude that supervised learning of center locations can be very important for radial basis function learning. 1 Introduction Radial basis function (RBF) networks are ...

