| S.E. Fahlman, Christian Lebiere, The cascade-correlation learning architecture, Advances in Neural Information Processing Systems 2 (1990) 524--532. |
....of pattern presentations necessary to achieve a satisfying performance. Fritzke says that GCS needs 180 epochs to achieve the above mentioned result and reports comparisons with some earlier methods: Backpropagation [54] 20000 epochs) Cross Entropy BP [54] 10000 epochs) Cascade Correlation [55] (1700 epochs) and he says that the number of epochs required by GCS is about one or two orders of magnitude less than the other techniques reported. We can see that FACS needs about 15 iterations to obtain the same result as the GCS, i.e. a twelfth of the epochs required by the GCS. However, ....
S. E. Fahlman and C. Lebiere, \The cascade-correlation learning architecture," in Advances in Neural Information Processing Systems 2 (D. S. Touretzky, ed.), pp. 524-532, San Mateo: Morgan Kaufmann, 1990.
....N C = 144 presentations necessary to achieve a satisfying performance. Fritzke says that GCS needs 180 epochs to achieve the above mentioned result and reports comparisons with some earlier methods: Backpropagation [93] 20000 epochs) Cross Entropy BP [93] 10000 epochs) Cascade Correlation [94] (1700 epochs) and he says that the number of epochs required by GCS is about one or two orders of magnitude less than the other techniques reported. We can see that FACS needs about 15 iterations to obtain the same result as the GCS, i.e. a twelfth of the epochs required by the GCS. However, ....
S. E. Fahlman and C. Lebiere, \The cascade-correlation learning architecture, " in Advances in Neural Information Processing Systems 2 (D. S. Touretzky, ed.), pp. 524-532, San Mateo: Morgan Kaufmann, 1990.
....approaches to overcome the limitation of being confined to a single functional model. On reducing the model size, an idea generally called pruning [6] 11 ] have been pursued. There are also methods that grow the network to gain the required ability while training. Cascade Correlation Network [12], installs units one by one, connected in a cascade manner to form a deep structure. Increasing the basis function unit in the Radial Basis function (RBF) nets are tried for clustering problems in [13] However, simple growing of networks can sometimes result in excessively large structures. ....
S. E. Fahlman and C. Lebiere, The Cascade- Correlation Learning Architecture, Advances in Neural Information Processing Systems, vol. 2, pp. 524-532, Morgan Kaufmann, 1990.
....layer, only one neuron of output) the generalization of their structure can be made in a dynamic way very easily. Some research studies were already undertaken in order to build neural networks dynamically. Those include the dynamic creation of the nodes [1] the cascades correlation algorithm [4], the tilling algorithm [8] the algorithm self organizing [13] and the upstart algorithm [5] These algorithms are used to eliminate the need to determine in advance (before the training of a network) the number of neurons of the hidden layer. This is very useful because a simpler network ....
S. Fahlman and C. Lebiere. The cascade-correlation learning architecture, Advances in Neural Information Processing Systems I.D. Touretzky Editions, Morgan Kaufmann, San Mateo, CA, 1989.
....optimization of Neural Networks (NNs) consists essentially of two problems. These are the optimization of the network topology and the adjustment of the weights. For optimizing the network topology many investigations have been done. The well known cascade correlation algorithm proposed by Fahlman [FL90] starts with the minimal network topology and iteratively expands the NN. There are several This work was supported by the Thuringian Ministry for Science, Research, and Arts (Project ITHERA) pruning algorithms (e.g. CFP97, Set97] which iteratively reduce the network s complexity. ....
S. E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. Advances in Neural Information Processing Systems, 2:524-532, 1990.
....optimization of Neural Networks (NNs) consists essentially of two problems. These are the optimization of the network topology and the adjustment of the weights. For optimizing the network topology many investigations have been done. The well known cascade correlation algorithm proposed by Fahlman [FL90] starts with the minimal network topology and iteratively expands the NN. There are several This work was supported by the Thuringian Ministry for Science, Research, and Arts (Project ITHERA) pruning algorithms (e.g. CFP97, Set97] which iteratively reduce the network s complexity. Evolutionary ....
S. E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. Advances in Neural Information Processing Systems, 2:524--532, 1990.
....with the optimal number of parameters. The minimal number of parameters which makes the model capable of approximating the underlying process can be considered as the optimal number of parameters. A neural network model with the optimal number of parameters can be obtained by the network growing [2 4] or the network pruning techniques [5] While the network growing techniques start with a small network and gradually increase its size, adding one by one new neurons during the course of training, the network pruning techniques prune the redundant and insignificant parameters and hidden ....
Fahlman, S. E., Lebiere, C.: The Cascade-Correlation Learning Architecture. Advances in Neural Information Processing Systems 2 (1990), p. 524-532.
....task. Considering the small size of the given examples and the fact that it is already difficult to design a suitable network architecture to solve them, it should be obvious that it is almost impossible to design network architectures for large problems by hand. That is why several authors (e.g. [14, 15, 16, 27, 29, 30, 32, 33]) have designed learning methods that automatically generate an architecture as part of the learning algorithm. For small problems, these methods give good results, but for large problems they have their limitations. We will describe these later in this chapter. Very little theoretical knowledge ....
....for on line architecture adaptation can be classified into two categories: constructive algorithms which add complexity to the network starting from a very simple architecture until the network is able to learn the task. When the remaining error is sufficiently low, this process is halted (e.g. [14, 15, 29, 30]) destructive algorithms which start with large architectures and remove complexity, usually to improve generalization, by decreasing the number of free variables of the network. Nodes or edges are removed until the network is no MIT Press Math6X9 1999 09 23:12:47 Page 7 8 Egbert J. W. Boers and ....
[Article contains additional citation context not shown here]
S. E. Fahlman and C. Lebiere. The Cascaded-correlation learning architecture. Advances in Neural Information Processing Systems, 2:524--532, 1990.
.... units of fixed nodal activations (e.g. sigmoids) within the single hidden layer during the course of training [2, 18] Motivated by the necessity of creating high order feature detectors that combine the information from outputs of existing units, the cascaded correlation learning network (CCLN) [5] adds a new candidate unit that receives weighted connections not only from the network s inputs but also from any hidden unit already present in the network. These cascaded connections enable the new candidate unit to approximate the complex mapping of residual error which cannot be accomplished ....
.... proposed for training perceptrons [11] For example, the Newton method [19] the Quasi Newton method [25] the conjugate gradient method [23] as well as the GaussNewton method [15] Instead of training all the weights associated with all hidden units simultaneously as does in most BPLNs, a CCLN [5] incrementally train the weights associated each new candidate unit by freezing those of existing hidden units. More specifically, only a single layer of weights (either hidden or output) associated with each candidate hidden unit is being trained at any given time based on a maximum correlation ....
[Article contains additional citation context not shown here]
S. E. Fahlman and C. Lebiere. The cascaded-correlation learning architecture. Advances in Neural Information Processing Systems II, edited by D. S. Touretzky, pp. 524-532, Morgan Kaufmann Publishers Inc., CA, 1990.
....simultaneously for the entire network via the error back propagation algorithm [48] This procedure can be computationally expensive since the number of nodes is determined by training separate networks with different number of nodes. Hierarchical training methods such as cascade correlation [49] have also been developed for faster node by node training. Projection pursuit regression. Projection pursuit regression (PPR) is a nonlinear multivariate statistical modeling technique developed by Friedman and Stuetzle [50] for analyzing high dimensional data. The PPR model is similar to BPN, ....
S.E. Fahlman, C. Lebiere, Advances in neural information processing systems, Morgan Kaufamnn 2 (1990) 524.
....user is generally obliged to run a lot of trials error. This problem gave rise to a number of procedures for automatic design of MLP architectures. These procedures fall in 3 main categories: genetic algorithms [6] destructive or pruning methods [4] 7] 12] and constructive or growth methods [3] [5] 9] 10] 11] The algorithm we propose in this paper belongs to the group of constructive methods. These methods start with a minimal configuration and dynamically grow the network until a well suited structure is obtained. In order for a constructive method to be well defined, the following ....
S.E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. Advances in Neural Information Processing Systems II, ed. D.S. Touretzky, 2:524--532, 1989.
....form of transfer involves the direct or indirect assignment of known task representation (weight values) to a new task. We consider this to be an explicit form of knowledge transfer from a source task to a target task. Since 1990 numerous authors have discussed methods of representational transfer [15, 25, 26, 28, 29, 32, 36] which often results in substantially reduced training time with no loss in generalization performance. In contrast to representational transfer is a form we define as functional. Functional transfer does not involve the explicit assignment of prior task representation to a new task, rather it ....
S.E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. Advances in Neural Information Processing Systems 2, 2:524--532, 1990. ed. D.S. Touretsky.
....restriction on the learning rule, a network may need more nodes when trained using this learning rule than when trained using traditional backpropagation. Some systematic way is needed to obtain a network containing the right number of hidden nodes. Existing algorithms developed for this purpose [4, 6, 7, 8, 9] modify the backpropagation learning rule or the architecture of feedforward neural networks, or developed algorithms that are more complicated than needed. In the proposed Weight Restricted Training Algorithm (WRTA) we start with a small size network, then add new nodes to the network for ....
S.E. Fahlman and C. Lebiere, The Cascade-Correlation Learning Architecture, Advances in Neural Information Processing Systems II, D.S. Touretzky, Ed. Morgan Kaufmann, 1990, 524-532.
....using backpropagation. It was suspected that this may have been due to the Soft Weight Sharing problem being inherently ill conditioned. This was later confirmed by the author (Nowlan, personal communication, 1992) Another technique that is claimed to improve generalisation is Cascade Correlation [13]. In this technique, not only the connection weights, but the entire architecture of the hidden layer(s) of the network is determined by the training data. In this sense, CascadeCorrelation can be seen as the antithesis of the MBNN approach. Tangent Prop [9] is in the same spirit as the MBNN ....
S.E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. Advances in Neural Information Processing Systems, 2, 1990.
....obtained with the MLP network, previously reported in [15] and against equivalent experiments attained by RBF networks also implemented in this work. 2. Constructive Methods Many authors have contributed to the development of neural networks based on the idea of dynamic network construction [3, 5, 8, 14, 7]. M ezard and Nadal [14] proposed the tiling algorithm which is used to generate a constructive multilayer feedforward network capable of learning any Boolean function of N Boolean units. The algorithm is based on a divide and conquer strategy that grows the network by adding new layers and ....
....gradually leads to error minimization towards the output layer. This network was shown by Frean to have convergence guaranteed in the case of the classification of binary patterns and experiments also showed that it generally produced less units than the tiling algorithm. Fahlman and Lebiere [5] proposed the cascadecorrelation learning architecture with the goal of overcoming certaing problems and limitations of the backpropagation algorithm. In [3] Burgess showed how a constructive algorithm can be generated by the combination of a cascadecorrelation architecture with perceptron like ....
[Article contains additional citation context not shown here]
S.FahlmanandC.Lebiere.Thecascade-correlationlearning architecture. Advances in Neural Information Processing Systems II, 1990.
....storage capacity. Boltzmann network [Ackley 85] Balzer 91] 6 8 2 Coarse quantization of the weights during learning. Alspector 92] 5 2 Coarse weight quantization for Boltzmann and mean field learning. Neocognitron [Fukushima 80] White 92] 3 1 Uses power of two weights. Cascade topology [Fahlman 90] Hoehfeld 92] 12 2 1 Coarse weight quantization in the cascade correlation algorithm. Hoehfeld 92] 6 2 1 Cascade correlation with probabilistic rounding and variable gain. Campbell 95] 1 2 1 A constructive algorithm for Heaviside cascade networks. Table 4: Weight discretization in other neural ....
S. E. Fahlman and C. Lebiere. The Cascade-Correlation Learning Architecture. Advances in Neural Information Processing Systems (NIPS89), vol. 2, pp. 524--532, Morgan Kaufmann, San Mateo CA, 1990.
....e network 1) network S R e e R e e k (x, w k k f (x, w k k k ( 1 1 r1 rj s r1 rj k k k k k Figure 1: Description of the adding procedure for 2 [S,RBF]NN. number of methods for automatic design of NN architectures including pruning methods [5] 11] and constructive methods [4] [6] 9] 10] The two last questions represent the topic of this paper in which we propose our incremental algorithm for two layers NN with linear output units and a mixture of sigmoids and RBFs in the hidden layer (2 [S,RBF]NN) This algorithm is inspired from the original one we developed in ....
S.E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. Advances in Neural Information Processing Systems II, ed. D.S. Touretzky, 2:524-- 532, 1989.
....more hidden units. However, overfitting problems may arise by adding more parameters and it may affect the generalization ability. Two main approaches are tried to find an optimal size of the neural network. First, it starts with a minimal neural network and adds new hidden units one by one [3]. Second, it starts from a large number of redundant weights of a multi layered neural network and delete weights selectively through second derivative information [5] For increasing learning accuracy, multiple learned classifiers has been attractive recently [1, 2, 10] Our approach is a kind of ....
S. E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. Advances in Neural Information Processing Systems, 2:524--532, 1990.
....domains where the tasks were very closely related. In general, we define this form of transfer as being representational since it involves the direct or indirect assignment of known task representation to a new task. Since 1990 numerous authors have discussed methods of representational transfer [Prat93a, Prat94, Shar92, Shav90, Towe90, Fahl90, Sing92, Ring93]. This form of transfer often results in substantially reduced training time with no loss in generalization performance. In contrast to representational transfer is a form we define as functional. Functional transfer does not involve the explicit assignment of prior task representation to a new ....
S.E. Fahlman and C. Lebiere, The cascade-correlation learning architecture, Advances in Neural Information Processing Systems 2, Morgan Kaufmann, Vol. 2, pp. 524--532, San Mateo, CA, 1990.
....ANN topologies that are well suited to the task they are intended to learn. KBANN does this by using a knowledge base of approximately correct, domain specific rules to determine the ANN s structure and initial weights. This provides an alternative to techniques that either shrink [3] or grow [5] networks to the right size. Our experiments on splice junctions, and previously on bacterial promoters, 12] demonstrate that the KBANN approach can substantially reduce the number of training examples needed to reach a given level of accuracy on future examples. This research was partially ....
S. Fahlman and C. Lebiere. The cascade-correlation learning architecture. Advances in Neural Information Processing Systems 2, pages 524--532, 1990.
No context found.
S.E. Fahlman, Christian Lebiere, The cascade-correlation learning architecture, Advances in Neural Information Processing Systems 2 (1990) 524--532.
No context found.
Fahlman, S.E. The Recurrent Cascade-correlation Architecture. Advances in Neural Information Processing Systems, 3:190--196, 1991.
No context found.
Fahlman,S.E. & Lebiere.C. The cascade correlation learning architecture. Advances in Neural Information Processing Systems 2, 1990.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC