| Pratt, L. Y., Mostow, J., and Kamm C. A., Direct Transfer of Learned Information Among Neural Networks, in: Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA, 584-589, 1991. |
....overview of the bases of ANN knowledge extraction under the form of logical functions. ANN design relations are established. 1. INTRODUCTION During the last decade considerable effort has been dedicated to write and read symbolic information into and from artificial neural networks (ANNs) [1, 2, 4, 10 14, 16 17, 22 23, 26, 29, 30, 32, 36 38]. The motivation has been multifold. Primarily, ANNs have shown a very good ability to represent empirical knowledge , as the one contained in a set of examples, but the information is expressed in a sub symbolic form i.e. in the structure, weights and biases of a trained ANN, not directly ....
Pratt, L. Y. and Kamm, C. A., "Direct transfer of learned information among neural networks", in Proceedings of the Ninths National Conference on Artificial Intelligence Anaheim, CA, 1991, pp. 584-589
....The continual learning framework is conceptually similar to learning structures presented by other researchers, although the temporal continuity of their systems are often not stressed as highly. Pratt, for example, has given attention to the problem of knowledge transfer among di erent tasks [65, 66, 67]. Here the goal is to apply knowledge acquired in one learning task to another learning task in order to speed learning and possibly increase nal accuracy. Pratt speci cally investigates transfer among neural network structures by preserving decision hyperplanes across tasks. The arrangement of ....
L. Pratt, J. Mostow, and C. Kamm. Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Articial Intelligence, Anaheim, CA, 1991. MIT Press.
....common to engineering design in general. Other possible reasons include the incorporation of prior knowledge (Bennani, 1995) The incorporation of prior knowledge usually takes the form of suggesting an appropriate decomposition of the global task. A modular approach can also reduce training times (Pratt, Mosow and Kamm, 1991) and make subsequent modification and extension easier (Gallinari, 1995) An interesting example of the variety of capabilities afforded by a modular approach is given by Baxt (1990) where the use of two differently trained artificial neural networks made it possible to increase the detection ....
Pratt, L.Y., Mostow, J. and Kamm, C.A. (1991) Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), 584-589, Anaheim, CA.
....tasks are becoming well understood, there is growing research interest directed to dynamic and adaptable systems. Studies have examined issues such as concept drift [55] learning bias from multiple tasks [7] continual learning [53, 57] multitask learning [9] knowledge transfer between tasks [46, 44], and lifelong learning [69] While progress has been made, this branch of study still contains many unresolved issues. Methods are needed to reliably detect when the underlying concept being modeled has changed since training began and to adapt the current domain model to the new conditions. It ....
....techniques that automatically discard training instances by age, for example, integrate the two phases with an assumption of drift. In general, the process of adapting current models to new information has been studied under the rubrics of multitask learning [9] knowledge transfer between tasks [46, 44], and lifelong learning [69] Chapter 2 Issues and Related Work In this chapter we will discuss the goals of the anomaly detection domain, related background work, and the issues raised by the proposed research. Although we make an effort to divide the issues into those most nearly learning ....
[Article contains additional citation context not shown here]
L. Pratt, J. Mostow, and C. Kamm. Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA, 1991. MIT Press.
....to explain why they exhibit that systematicity. Here we will here only give a simple explanation, involving the representations generated by the representation generator and used by the TN. The interested reader can also look at the explanations possible by the use of hyperplane analysis (e.g. Pratt and Kamm, 1991; Pratt, Mostow and Kamm, 1991; Sharkey and Jackson 1994) The design and training regime of the representation generator results in representations that are systematically positioned in the space so that the representation for s occupies the space in between the known constituents: Fig. 3 ....
....exhibit that systematicity. Here we will here only give a simple explanation, involving the representations generated by the representation generator and used by the TN. The interested reader can also look at the explanations possible by the use of hyperplane analysis (e.g. Pratt and Kamm, 1991; Pratt, Mostow and Kamm, 1991; Sharkey and Jackson 1994) The design and training regime of the representation generator results in representations that are systematically positioned in the space so that the representation for s occupies the space in between the known constituents: Fig. 3 Representations Formed by the ....
Pratt L.Y, Mostow J. and Kamm C. A. 1991: Direct Transfer of Learned Information Among Neural Networks, Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI91) , 584 - 589.
....such an ability, neural networks cannot be used as the basis of a rule refinement system. Furthermore, it is hard to be confident in the reliability of a network that addresses a real world problem. Also, the fruits of training neural networks are difficult to transfer to other neural networks (Pratt Kamm, 1991), much less nonneural learning systems. Hence, a trained neural network is something like Pierre de Fermat and his (in)famous last theorem. Like Fermat, the network tells you that it is has discovered something wonderful , but then does not tell you exactly what that something is. This paper ....
Pratt, L. Y. & Kamm, C. A. (1991). Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence, (pp. 584--589), Anaheim, CA.
....Transfer of Weights Pratt has extensively studied reusing the internal system states of MLP type feed forward neural networks to enhance the construction of a similar classifier. Often, a neural network s weights are randomly initialized at the beginning of the learning process. In [63] and [64], information about the weights from previously trained neural networks was used to initialize the weights of other networks. This should start network learning on the right foot by beginning the iterative learning process in a state that is closer to optimal than would be expected using random ....
Lorien Y. Pratt, Jack Mostow, and Candace A. Kamm. Direct transfer of learned information among neural networks. In Proceedings of Ninth National Conference on Artificial Intelligence, pages 584--589, 1991.
....as indicated by the results from the classification networks. 3.4 Explaining the presence of structure with decision hyperplanes Each receptive unit in a network implements a decision hyperplane in terms of the weights that connect to it. For detailed descriptions see (Hanson and Burr, 1990; Pratt et al. 1991; Sharkey and Jackson, 1994) Ideally, the output function is a heaviside function which discretely discriminates between on and off . This condition can be relaxed. To ensure that only one switch between on and off occurs, the output function needs to be monotonically increasing (or ....
Pratt, L. Y., Mostow, J., and Kamm, C. A. (1991). Direct transfer of learned information among neural networks. In Proceedings of AAAI 91.
....trois couches, puis on utilise les poids entre les deux premires couches pour initialiser un rseau qui apprend une seconde tche. Si les deux tches ont certains points communs, on constate un transfert positif. On trouvera d autres tudes sur le transfert dans les rseaux connexionnistes dans [Pratt et al. 91] et [Thrun Mitchell 94] Tous ces travaux ont un lien vident avec l apprentissage progressif, car dans ce dernier, on compte sur un transfert positif d un sous ensemble l autre. III.1.2.c Discussion Il ressort clairement de cet tat de l art un dsquilibre en faveur de l apprentissage actif ....
Lorien Y. Pratt, Jack Mostow & Candace A. Kamm. Direct transfer of learned information among neural networks. Proc. of the Ninth National Conference on Artificial Intelligence, vol. II, p. 584-589, Menlo Park: MIT Press, 1991.
....After Kbann nets have refined, they can be used as highly accurate classifiers (Towell et al. 1990) However, trained Kbann nets provide no explanation of how an answer was derived. Nor can the results of their learning be shared with humans or transferred to related problems. The work of Pratt et al. 1991) partially ameliorates this problem. The extraction of symbolic rules directly addresses both of these problems. It makes the information learned by the Kbann net accessible for human review and justification of answers. Moreover, the modified rules can be used as a part of knowledge bases for ....
Pratt, L. Y., Mostow, J., and Kamm, C. A., "Direct transfer of learned information among neural networks", Proceedings of the Ninth National Conference on Artificial Intelligence, pp. 584--589, Anaheim, CA, 1991.
....useful auxiliary context. The additional task of recognizing road stripes was able to improve empirically the performance of a system learning to steer a car to follow the road [8] Other examples where multitask learning has successfully been applied to real world problems appear in [37, 33, 30, 14]. Importantly, this empirical phenomena of such context sensitivity in machine learning [28] is also supported, for example, by mathematical existence theorems for this phenomena (and variants) in [1] 1 More technical theoretical work appears in [23, 26] The theoretical papers, 1, 23] ....
L. Pratt, J. Mostow, and C. Kamm. Direct transfer of learned information among neural networks. In Proceedings of the 9th National Conference on Artificial Intelligence (AAAI91) , 1991.
....useful auxiliary context. The additional task of recognizing road stripes was able to improve empirically the performance of a system learning to steer a car to follow the road [7] Other examples where multitask learning has successfully been applied to real world problems appear in [9, 20, 23, 27]. Importantly, these empirical phenomena of such context sensitivity in machine learning [19] is also supported, for example, by mathematical existence theorems for this phenomena (and variants) in the inductive inference framework [1] More technical theoretical work appears in [15, 17] For a ....
L. Pratt, J. Mostow, and C. Kamm. Direct transfer of learned information among neural networks. In Proceedings of the 9th National Conference on Artificial Intelligence (AAAI-91), 1991.
....the main digit tasks. By breaking out the separate other tasks into separate MTL tasks, the net has a better chance of learning to discriminate digits from non digits. LeCun, 1997 (private communication) 4.10 Sequential Transfer MTL is parallel transfer. It might seem that sequential transfer [Pratt Mostow 1991; Pratt 1992; Sharkey Sharkey 1992; Thrun Mitchell 1994; Thrun 1995] should be easier. This may not be the case. Some of the advantages of parallel transfer are: ffl The full detail of what is being learned for all tasks is available to all tasks because all tasks are learned at the same ....
....of related learning tasks. Such cross task learning may well be the key to powerful human level learning abilities. Sutton 1992] 8.3 Serial Transfer Transferring learned structure between related tasks is not new. The early work on sequential transfer of learned structure between neural nets [Pratt et al. 1991; Pratt 1992; Sharkey Sharkey 1992] clearly demonstrates that what is learned for one task can be used as a bias for other tasks. Unfortunately, this work failed to find improvements in generalization performance; the main benefit was speeding up learning. More recently, Mitchell and Thrun ....
Pratt, L. Y., Mostow, J., and Kamm, C. A., "Direct Transfer of Learned Information Among Neural Networks," Proceedings of AAAI-91, 1991.
....the main digit tasks. By breaking out the separate other tasks into separate MTL tasks, the net has a better chance of learning to discriminate digits from non digits. LeCun, 1997 (private communication) 4.10 Sequential Transfer MTL is parallel transfer. It might seem that sequential transfer [Pratt Mostow 1991; Pratt 1992; Sharkey Sharkey 1992; Thrun Mitchell 1994; Thrun 1995] should be easier. This may not be the case. Some of the advantages of parallel transfer are: ffl The full detail of what is being learned for all tasks is available to all tasks because all tasks are learned at the same ....
....of related learning tasks. Such cross task learning may well be the key to powerful human level learning abilities. Sutton 1992] 8.3 Serial Transfer Transferring learned structure between related tasks is not new. The early work on sequential transfer of learned structure between neural nets [Pratt et al. 1991; Pratt 1992; Sharkey Sharkey 1992] clearly demonstrates that what is learned for one task can be used as a bias for other tasks. Unfortunately, this work failed to find improvements in generalization performance; the main benefit was speeding up learning. More recently, Mitchell and Thrun ....
Pratt, L. Y., Mostow, J., and Kamm, C. A., "Direct Transfer of Learned Information Among Neural Networks," Proceedings of AAAI-91, 1991.
.... also opens the door to a number of other interesting possibilities (see below) As suggested above, there is also a certain relation between our meta learning scenario and the notion of transfer of knowledge between learning tasks, as mentioned though in rather different settings by, e.g. Pratt et al. 1991), Ourston and Mooney (1991) Caruana (1993) and Thrun and Mitchell (1995) While these authors study the effect of cross category transfer, MetaL(B) can be interpreted as performing cross context transfer. The effect can be most clearly seen in the Schubert experiment above. In terms of the ....
Pratt, L.Y., Mostow, J., and Kamm, C.A. (1991). Direct Transfer of Learned Information Among Neural Networks. In Proceedings of the 9th National Conference on Artificial Intelligence (AAAI-91), Anaheim, CA.
....daunting task. This is a significant shortcoming, for without the ability to produce understandable decisions, it is hard to be confident in the reliability of networks that address real world problems. Also, the fruits of training neural networks are difficult to transfer to other neural networks (Pratt et al. 1991) ameliorate this problem) and all but impossible to directly transfer to non neural learning systems. Hence, a trained neural network is analogous to Pierre de Fermat s comment about his last theorem. Like Fermat, the network tells you that it is has discovered something wonderful , but then ....
Pratt, L. Y., Mostow, J., & Kamm, C. A. (1991). Direct transfer of learned information among neural networks. Proceedings of the Ninth National Conference on Artificial Intelligence (pp. 584--589). Anaheim, CA: AAAI Press.
....learning and generalising of similar domains. Some research has examined how literal transfer between networks trained on related tasks can affect generalisation (Sharkey and Sharkey, 1993) There have also been work on how the decomposition of a learning problem can lead to better generalisation (Pratt et al. 1991). Rules can be very useful as a starting point for learning from examples. Giles and Omlin (Giles and Omlin, 1993) evaluate the implications when inserting grammatical rules by pre setting weights in a recursive network. The network can, after encoding locally defined hints into weights and ....
Pratt, L. Y., Mostow, J., and Kamm, C. A. (1991). Direct transfer of learned information among neural networks. In Proceedings of AAAI 91.
....speed and performance of AP net , a neural network designed to map acoustic spectra to phoneme classes. Problem decomposition (also called modular decomposition ) has previously produced both performance improvement ( Waibel et al. 1989 ] and significant learning speed improvement ( Pratt et al. 1991 ] on less complex tasks than the one attempted by AP net. The motivation for problem decomposition is that training time can be significantly decreased by breaking a combinatorial search problem into several sub problems. For example, if there are ten possible values for each of six neural ....
....Tuning: All network weights are allowed to change during further training. The performance obtained by [ Waibel et al. 1989 ] for a decomposed network was slightly better than performance of a monolithic (i.e. non decomposed) network. In addition, there was a marked decrease in learning time. Pratt et al. 1991 ] validated this study on a vowel recognition problem with a different network topology and a different stopping criterion and showed that the learning speed for several decomposed networks was significantly faster than the learning speed of monolithic (non decomposed) networks. The Decomposed ....
[Article contains additional citation context not shown here]
Lorien Y. Pratt, Candace A. Kamm, and Jack Mostow. Direct Transfer of Learned Information among Neural Networks, 1991. (To Appear), Proceedings for the American Association for Artificial Intelligence (AAAI-91).
....Since it s been shown that networks can give very different behavior dependent on initial conditions (cf. Kolen and Pollack, 1990 ] it is important to determine whether the improvement observed was due to that effect or to problem decomposition. This work was extended [ Pratt and Kamm, 1991, Pratt et al. 1991, Pratt, 1993 ] These studies did test for the statistical significance of performance and learning speed results over multiple initial conditions, and found, in support of [ Waibel et al. 1989 ] that there was no significant performance improvement, but that learning speed was substantially ....
Lorien Y. Pratt, Jack Mostow, and Candace A. Kamm. Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), pages 584--589, Anaheim, CA, 1991.
....same larger distribution (English speakers) they may be related in a way that can be exploited to speed up learning on the British network, compared to when weights are randomly initialized. We have previously introduced the question of how trained neural networks can be recycled in this way [ Pratt et al. 1991 ] we ve called this the transfer problem. The idea of transfer has strong roots in psychology (as discussed in [ Sharkey and Sharkey, 1992 ] and is a standard paradigm in neurobiology, where synapses almost always come pre wired . There are many ways to formulate the transfer problem. ....
Lorien Y. Pratt, Jack Mostow, and Candace A. Kamm. Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), pages 584--589, Anaheim, CA, 1991.
....values of trained weights) to use as the initial conditions of the target learner. We ll call this literal transfer. Previously, literal transfer from multiple source neural networks, each of which form a subproblem of a target problem, has been explored [Waibel et al. 1989, Pratt and Kamm, 1991, Pratt et al. 1991, Pratt, 1993] In this paradigm, source networks are responsible for subsets of the output classes, and may receive specialized input features for the class subset. Such a decomposition of training can lead to substantially increased learning speed. Transfer between populations. Another important ....
....by the training data shown here. Because of this shift, two of the source network hyperplanes are helpful in separating class 0 from class 1 data in the target task, and two are not. y j (1 0 y j ) which reduces 1w ij , which makes the hyperplane partially determined by w ij move slower (see [Pratt et al. 1991] for experimental evidence of this effect) Although these input to hidden equation factors aren t the only ones affecting the rate of hyperplane movement (error from the output layer and input activations are also factors) we do expect that, all other things being equal, large weight magnitudes ....
Lorien Y. Pratt, Jack Mostow, and Candace A. Kamm. Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), pages 584--589, Anaheim, CA, 1991.
....neural networks, as well as their subsequent extraction. Fu, 1991, Towell et al. 1990, Towell and Shavlik, 1992 ] By using these methods for extraction followed by insertion into a target network, they can be viewed as indirect transfer. This approach can be contrasted with direct transfer [ Pratt et al. 1991 ] which does not require a shift in representation to a logical formalism between source and target learners. Direct transfer is more generally applicable because some knowledge is not conveniently expressed logically. Also, some knowledge is not readily extracted. The transferred information is, ....
....relatively slower. Furthermore, high magnitude IH weights tend to lead to high magnitude inputs to hidden neurons. This leads to y j being close to 0 or 1, which lowers y j (1 Gamma y j ) which reduces Deltaw ij , which makes the hyperplane partially determined by w ij move slower (see [ Pratt et al. 1991 ] for experimental evidence of this effect) Transfer in neural networks 10 0.2 0.4 0.6 0.8 HU 1 HU 2 1 1 0 1 0 0 Epoch 1 Feature 1 Feature 2 (a) Literal (b) DBT 0.2 0.4 0.6 0.8 HU 1 HU 2 1 1 0 1 0 0 Epoch 1 Feature 1 Feature 2 0.2 0.4 0.6 0.8 HU 1 HU 2 HU2 1 1 0 1 0 0 Epoch 100 Feature 1 ....
Lorien Y. Pratt, Jack Mostow, and Candace A. Kamm. Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), pages 584--589, Anaheim, CA, 1991.
No context found.
Pratt, L. Y., Mostow, J., and Kamm C. A., Direct Transfer of Learned Information Among Neural Networks, in: Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA, 584-589, 1991.
No context found.
Pratt, L.Y., J.A. Mostow & C.A. Kamm, 1991. Direct Transfer of Learned Information Among Neural Networks, Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), 584--589.
No context found.
Pratt, L.Y., Mostow, J., and Kamm, C.A. (1991). Direct Transfer of Learned Information among Neural Networks. Proceedings of the 9th National Conference on Articial Intelligence, AAAI-91 (pp. 584-589), Anaheim, CA
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC