| Tom M. Mitchell and Sebastian B. Thrun. Explanation-Based Neural Network Learning for Robot Control. In Advances in Neural Information Processing Systems 5, J. E. Moody and S. J. Hanson and R. P. Lipmann, Ed., Morgan Kaufmann, 1993. |
.... [3] and the more popular description of such systems by Braitenberg [2] see also [4] for a review of the use of behaviour based systems for robot control, and [1] for a thorough text book) There have been numerous examples of how to use artificial neural networks as robot controllers (e.g. [8, 19]) evolutionary computation for development of the robot controllers (e.g. 5, 6, 20] and different kinds of behaviour based systems for control (e.g. 1,3] Most of these systems are supposed to adapt to the environment, and often one can consider an adaptation to the ecological niche (see ....
T.M. Mitchell and S.B. Thrun. Explanation-Based Neural Network Learning for Robot Control. In Hanson, Cowan, and Giles (eds.), Advances in Neural Information Processing Systems 5, MorganKaufmann Press, 1993.
....a smooth interpolation between points with explicit training derivatives. Explanation based neural networks use previously trained neural networks as initial domain theories, and compute training derivatives from each observed training sample that describes the relevance of each input feature [28]. They are then trained using the TangentProp learning algorithm which minimizes the network output error and the error in network derivatives. 2.2 Applications in Medicine For an overview of neural network applications in medicine, see e.g. 5, 36] For a brief summary of neural network methods ....
T. Mitchell and S. Thrun, \Explanation-based neural network learning for robot control," in Advances in Neural Information Processing Systems 5 (J. C. S. Hanson and C. Giles, eds.), pp. 287-294, MorganKaufmann Press, 1993.
....interpolation between points with explicit training derivatives. Explanation based neural networks (EBNNs) use previously trained neural networks as initial domain theories, and compute training derivatives from each observed training sample that describes the relevance of each input feature [6]. They are then trained using the TangentProp learning algorithm which minimizes the network output error and the error in network derivatives. The initialization of feedforward networks with Horn clauses have been the predominant paradigm for prior knowledge in the neural networks community. More ....
T. Mitchell and S. Thrun, \Explanation-based neural network learning for robot control," in Advances in Neural Information Processing Systems 5 (J. C. S. Hanson and C. Giles, eds.), pp. 287-294, Morgan-Kaufmann Press, 1993.
....constraints. A merely technical matter is the way knowledge from other tasks is incorporated. Some approaches use 9 previously learned knowledge as an initial point for the parameter search (see e.g. 45] whereas others incorporate this knowledge as a constraint during the search (see e.g. [36, 67]) In a recent study, O Sullivan has compared both methodologies empirically and characterized the key advantages of each of them [42] ffl Performance tasks. In some scenarios, the performance for all tasks is important, whereas others contain a designated performance task which is the only task ....
T. M. Mitchell and S. Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287--294, San Mateo, CA, 1993. Morgan Kaufmann.
....function values. Based on this assumption, various function approximators have been used in conjunction with DP, in spite of the loss of the convergence guarantees that can be found in DP theory. The function approximators often met in Adaptive DP are CMAC [14] 27] 28] Neural Networks [11] [29], 30] 31] 32] and FIS [10] 11] 33] 34] FACL and FQL use FIS to approximate the different functions, the critic and the actor for the former, and the Q function for the latter. Thus the state space coding is realized by the input variable fuzzy sets. The state representation is based on ....
T.M. Mitchell and S. Thrun, "Explanation-based neural network learning for robot control," in Advances in Neural Information Processing Systems, San Mateo, 1993, vol. 5, Morgan Kaufmann.
....method for learning real valued, discrete valued, and vector valued functions from examples. ANN 1 learning is robust to errors in the training data and has been successfully applied to problems such as interpreting visual scenes, speech recognition, and learning robot control strategies [Mitchell and Thrun, 1993]. ANNs are among the most e ective learning methods currently known. Decision Trees [Quinlan, 1990, Breiman et al. 1984] are one of the most widely used and practical methods for inductive inference. Decision trees involve approximating discrete valued target functions in which the learned ....
Mitchell, T. and Thrun, S. (1993). Explanation-based neural network learning for robot control. Advances in neural information processing systems, pages 287-294.
....behavior. Most explanation based learning techniques [Mit90] cannot be easily applied to our learning task because they require complete and correct models of parameter adaption in state of the art robot control systems, which can rarely be provided. Explanation based neural net learning [MT93, TS89] does not depend on such correct and complete domain theories and is therefore more promising for parameterization learning. MTTL can be regarded as a form of explanation based learning that can use symbolically represented domain knowledge that is not required to be complete and correct. ....
....form of explanation based learning that can use symbolically represented domain knowledge that is not required to be complete and correct. Inductive concept learning [MTK98] can be used in combination with other learning techniques like reinforcement learning [SR97] or explanation based learning [MT93] In our own research we plan to apply inductive concept learning methods to learn part of the models used by MTTL. Goel et al. GSCR97] introduce a framework that is, in some aspects, similar to ours: A model based method monitors the behavior generated by a reactive robot control system, ....
T. Mitchell and S. Thrun. Explanation-based neural network learning for robot control. In C. L. Giles, S. J. Hanson, and J. D. Cowan, editors, Advances in Neural Information Processing Systems 5, Proceedings of the IEEE Conference in Denver (to appear), San Mateo, CA, 1993. Morgan Kaufmann.
....interpolation between points with explicit training derivatives. Explanation based neural networks (EBNNs) use previously trained neural networks as initial domain theories, and compute training derivatives from each observed training sample that describes the relevance of each input feature [10] They are then trained using the TangentProp learning algorithm which minimizes the network output error and the error in network derivatives. The initialization of feedforward networks with Horn clauses have been the predominant paradigm for prior knowledge in the neural networks community. ....
....We chose learning rate ff = 0:1 and momentum fi = 0:1 5 , and trained networks until one of the three following stopping criteria was satisfied: ffl On 99 of the training examples, the activation of every output unit was within 0. 25 of correct, or ffl a network has been trained for 5,000 and 10,000 epochs for the two benchmark tests, respectively, or ffl a network classified at least 90 of the training examples correctly, but has not improved it s ability to classify the training examples for five epochs. In general, networks trained on the second problem stopped training on the ....
[Article contains additional citation context not shown here]
T. Mitchell and S. Thrun, "Explanationbased neural network learning for robot control," in Advances in Neural Information Processing Systems 5 (J. C. S. Hanson and C. Giles, eds.), pp. 287--294, MorganKaufmann Press, 1993.
.... training examples [1, 33] the parallel learning of related tasks constrained to use a common internal representation [6, 9] or the use of historical training information (most commonly the learning rate or gradient of the error surface) to augment the standard weight update equations [20, 23, 34]. These pressures serve to reduce the effective hypothesis space in which the learning system performs its search. This form of transfer has its greatest value from the perspective of increased generalization performance. Certain methods of functional transfer have also been found to reduce ....
Tom Mitchell and Sebastian Thrun. Explanation based neural network learning for robot control. Advances in Neural Information Processing Systems 5, 5:287--294, 1993. ed. C. L. Giles and S. J. Hanson and J.D. Cowan.
....that fit this description. Some of these are problems of perception and action that humans accomplish effortlessly, yet we cannot articulate how we do so. Others are more abstract problems of interest to science and medicine. All of them have been the subject of machine learning research (e.g. [7, 25, 35, 37, 40, 44, 45, 46]) This is not to say we should require our machines to learn absolutely everything from scratch. We should certainly take advantage of existing domain knowledge, both low and highlevel, to the extent we can afford it. There is no reason to learn logical inference rules from first principles when ....
....like connectionism. Or do they Are huge rule bases of the scale needed to simulate human level intelligence any more comprehensible than artificial neural networks (ANNs) On the other hand, why can not connectionist representations be made as understandable as rules Mitchell and Thrun [25] develop ANNs which model various primitive robot actions and then treat these networks as if they were rules. Others have developed methods that allow the extraction of symbolic rules from trained neural networks [10, 13, 14, 42, 48, 49] so the two representation styles are not as irreconcilable ....
T.M. Mitchell and S.B. Thrun. Explanationbased neural network learning for robot control. In Advances in Neural Information Processing Systems, volume 5, Denver, CO, 1993. Morgan Kaufmann.
....RL architectures can easily incorporate many different kinds of domain knowledge. Indeed, a significant proportion of the current research on RL is about incorporating domain knowledge into RL architectures to alleviate some of their problems (Singh [99] Yee et al. 129] Mitchell and Thrun [75], Whitehead [125] Lin [66] Clouse and Utgoff [28] Despite the fact that under certain conditions RL algorithms may be the best available methods, conventional RL architectures are slow enough to make them impractical for many real world problems. While some researchers are looking for faster ....
....After every training episode, a form of explanation based generalization (Mitchell et al. 76] is used to determine a set of predecessor states that should have the same value. Any errors in generalization are handled via a mechanism for storing exceptions to concepts. Mitchell and Thrun [75] extended this approach to situations where a symbolic domain theory may be unavailable. They use on line learning experiences to estimate a neural network based environment model. Network inversion techniques are used to determine the slope of the value function in a local region around the ....
T.M. Mitchell and S.B. Thrun. Explanation-based neural network learning for robot control. In S.J. Hanson, J.D. Cowan, and C.L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287--294. Morgan-Kaufmann, 1992.
.... of continuous dynamic programming (DP) and uses the linear quadratic regulator as a didactic example; section 3 shows it working using a radial basis function based function approximator on both regulator and non regulator problems; section 4 discusses the results and shows the relationship with Mitchell and Thrun s (1993) Explanation Based Q learning method. 2 Continuous DP, Advantages and Curl Consider the problem of controlling a deterministic system to minimise V (x 0 ) min u(t) Z 1 0 r(y(t) u(t) dt where y(t) 2 n is the state at time t, u(t) 2 m is the control, y(0) x 0 ; and y(t) f ....
....graphs might contain no cycles at all. In the continuous case, if the updates are sufficiently smooth, this is not possible. For stochastic problems, the consistency condition equivalent to equation 3 will involve an integral, which, if doable, would permit the application of our method. Mitchell and Thrun (1993) showed how ideas from explanation based learning could be used to improve Q learning. Under their scheme (called EBNN) the agent learns a conventional (absolute) Q function for a continuous state space, and also learns a model. This model is used as in explanation based learning to generalise ....
Mitchell, TM & Thrun, SB (1993). Explanation-based neural network learning for robot control. In NIPS 5.
No context found.
Tom M. Mitchell and Sebastian B. Thrun. Explanation-Based Neural Network Learning for Robot Control. In Advances in Neural Information Processing Systems 5, J. E. Moody and S. J. Hanson and R. P. Lipmann, Ed., Morgan Kaufmann, 1993.
....A second extension, also used widely, is to discount reward over time. If actions are to be chosen such that the number of actions is minimal, reward is typically discounted with a discount factor 7 1. The resulting control policy consequently prefers sooner reward to more distant reward. See [Mitchell and Thrun, 1993b] or [Thrun and Mitchell, 1993] for a more detailed description of these issues. Figure 4 Fitting slopes: Let f be a target function for which three examples ( f( 2, f( 2) and ( 3, f( 3) are known. Based on these points the learner might generate the hypothesis 9. If the output input ....
....to discount reward over time. If actions are to be chosen such that the number of actions is minimal, reward is typically discounted with a discount factor 7 1. The resulting control policy consequently prefers sooner reward to more distant reward. See [Mitchell and Thrun, 1993b] or [Thrun and Mitchell, 1993] for a more detailed description of these issues. Figure 4 Fitting slopes: Let f be a target function for which three examples ( f( 2, f( 2) and ( 3, f( 3) are known. Based on these points the learner might generate the hypothesis 9. If the output input derivatives are also known, the ....
[Article contains additional citation context not shown here]
Tom M. Mitchell and Sebastian Thrun. Explanation- based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287-294, San Mateo, CA, 1993. Morgan Kaufmann.
....target slopes o according to the following formula. Here max denotes the maximum prediction error, which is used for normalization. 16 This weighting scheme attempts to give accurate slopes a large weight in training, while ignoring inaccurate slopes. This heuristic weighting scheme, called LOB [Mitchell and Thrun, 1993], is based on the heuristic assumption that the accuracy of the explanation s slopes are correlated to the accuracy of the explanation s predictions. This completes the description of the EBNN learning mechanism. To summarize, EBNN refines the target network using a combined inductive analytical ....
....inductive learner as the accuracy of the domain theory decreases. The graceful degradation of EBNN with decreasing accuracy of the domain theory is due to the fact that misleading slopes are identified and their influence weakened (cf. Eq. 2) In other experiments reported elsewhere [Thrun and Mitchell, 1993] it was demonstrated that EBNN will fail to learn control if the domain theory is poor and o is kept fixed. These results also indicate that in cases where the domain theory is poor a pure analytical learner would be hopelessly lost. In the experiments reported here EBNN recovered from poor domain ....
Tom M. Mitchell and Sebastian B. Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287-294, San Mateo, CA, 1993. Morgan Kaufmann.
....of the tget slopes according to the following formula. Here max denotes the maximum prediction eor, which is used for normalization. is weighting scheme attempts to give accurate slopes a lge weight in training, 9 while ignoring inaccurate slopes. This heuristic weighting scheme, called LOB [Mitchell and Thrun, 1993], is based on the heuristic assumption that the accuracy of the explanation s slopes are correlated to the accuracy of the explanation s predictions. This completes the description of the EBNN learning mechanism. To summarize, EBNN refines the target network using a combined inductive analytical ....
....instances. The dashed lines indicate average performance. In this experiment, the agent used well trained predictive action models as its domain theory. the fact that misleading slopes are identified and their influence weakened (cf. Eq. 2) In other experiments reported elsewhere [Thrun and Mitchell, 1993] it was demonstrated that EBNN will fail to learn control if the domain theory is poor and c is kept fixed. These results also indicate that in cases where the domain theory is poor a pure analytical learner would be hopelessly lost. In the experiments reported here EBNN recovered from poor domain ....
Tom M. Mitchell and Sebastian B. Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287-294, San Mateo, CA, 1993. Morgan Kaufmann. 23
....c according to the following formula. 1 (2) max Here max denotes the maximum prediction error, which is used for normalization. This weighting scheme attempts to give accurate slopes a large weight in training, while ignoring inaccurate slopes. This heuristic weighting scheme, called LOB [Mitchell and Thrun, 1993], is based on the heuristic assumption that the accuracy of the explanation s slopes are correlated to the accuracy of the explanation s predictions. 9 Figure 4: Fitting values and slopes in EBNN: Let f be the target function for which three examples ( x , f (c ) x2, f(x2) and x3, f(x3) are ....
....instances. The dashed lines indicate average performance. In this experiment, the agent used well trained predictive action models as its domain theory. the fact that misleading slopes are identified and their influence weakened (cf Eq. 2) In other experiments reported elsewhere [Thrun and Mitchell, 1993] it was demonstrated that EBNN will fail to learn control if the domain theory is poor and o is kept fixed. These results also indicate that in cases where the domain theory is poor a pure analytical learner would be hopelessly lost. In the experiments reported here EBNN recovered from poor domain ....
Tom M. Mitchell and Sebastian B. Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287-294, S an Mateo, CA, 1993. Morgan Kaufmann.
....of training examples required for successful learning, and hence to scale machine learning technology to more complex robot scenarios. In this paper we will present one particular candidate approach to the lifelong learning problem: The explanation based neural network learning algorithm (EBNN) [13, 26, 27]. EBNN uses a hybrid learning strategy to generalize training data. On the one hand, it allows to learn functions inductively from scratch, just like neural network Backpropagation [17] On the other hand, EBNN also allows to learn task independent domain knowledge, which applies to multiple ....
....knowledge for scaling machine learning to more complex domains [27] From a lifelong learning perspective, much of the work presented in this paper is preliminary. While we have not yet studied robot control in the context of multiple tasks in practice, in experiments described here and elsewhere [13, 26, 27] we consistently found that EBNN outperforms pure inductive neural network learning, which does not employ background knowledge and hence learns from scratch. In a related paper we have illustrated superior generalization due to EBNN in a robot perception task [14] Learning mechanisms that allows ....
Tom M. Mitchell and Sebastian B. Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287 294, San Mateo, CA, 1993. Morgan Kaufmann.
.... used, for example, in [ Simard et al. 1992 ] to constrain a character recognition network to produce outputs that are invariant to certain transformations of the inputs (e.g. rotation of the input character) It has also been used in the Explanation Based Neural Network (EBNN) alogorithm ( Mitchell and Thrun, 1993 ] which uses prior knowledge encoded in previously learned networks to derive slope constraints that must be satisfied by the function to be learned. The goal of this paper is to explore the use of slope information for constraining neural network backpropagation learning. We describe tangent ....
....cost. Two related algorithms have been proposed for using slopes to constrain neural network learning. One is tangent prop proposed by Simard, Victorri, Le Cun, and Denker ( Simard et al. 1992 ] and the other is part of the EBNN learning procedure proposed by Mitchell and Thrun ( Mitchell and Thrun, 1993 ] Although they use different error functions, the essential part of these two algorithms is mathematically identical. We call both procedures tangent prop. For experimental analyses, we use error functions used in the EBNN learning procedure along with tangent prop. We call this the EBNN ....
Thomas M. Mitchell and Sebastian Thrun. Explanation-based neural network learning for robot control. In Stephen J. Hanson, Jack D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems 5, pages 287--294. Morgan Kaufmann, San Mateo, CA, 1993.
....true target value and the prediction by the domain theory for training example i. ffi threshold denotes the prediction error threshold for all the examples, and is used for normalization. Negative values of ff i , that is, values for which ffi i ffi threshold , are set to 0. This heuristic, LOB [15], for setting ff i is based on the assumption that the accuracy of the explanation s derivatives are correlated to the accuracy of the explanation s predictions. 3.5. Experimental Results The experiments here address the following two questions, which are central to EBNN and the role of prior ....
....therefore its cost cannot be amortized in this way. It is possible to construct an alternative domain theory in which the target function is itself used as a component of the domain theory, thereby eliminating the cost of gathering data to learn an additional object specific network. In other work [15] this type of structure was found to be successful for learning to control a simulated robot. Research is needed to understand the conditions under which using the target function as part of the domain theory can be successful, and to explore alternative approaches to reducing the per task costs ....
Tom M. Mitchell and Sebastian B. Thrun. Explanation-Based Neural Network Learning for Robot Control. In J. E. Moody, S. J. Hanson, and R. P. Lipmann, editors, Advances in Neural Information Processing Systems 5. Morgan Kaufmann, December 1993.
....resulting in it choosing increasingly effective actions. 6 Notice that for each control learning problem hS; A; W i ; R i i, 5 For simplification of the notation, we assume that reward will only be received at the end of an episode. EBNN can be applied to arbitrary reward functions. See [Mitchell and Thrun, 1993b] for more details. 6 Note that more sophisticated learning schemes for learning evaluation functions have been developed. In his dissertation, Watkins [Watkins, 1989] describes Q Learning, a scheme for learning evaluation function Q i (sk ; ak ) recursively. In Q Learning training patterns are ....
....to discount reward over time. If actions are to be chosen such that the number of actions is minimal, reward is typically discounted with a discount factor fl 1. The resulting control policy consequently prefers sooner reward to more distant reward. See [Mitchell and Thrun, 1993b] or [Thrun and Mitchell, 1993] for a more detailed description of these issues. the agent must learn a distinct Q i , since the reward differs for different tasks. B. The Explanation Based Neural Network Learning Algorithm How can the agent use its previously learned knowledge, namely the neural network action models, to ....
[Article contains additional citation context not shown here]
Tom M. Mitchell and Sebastian B. Thrun. Explanation-based neural network learning for robot control. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information ProcessingSystems 5, San Mateo, CA, 1993. Morgan Kaufmann. (to appear). Lifelong Robot Learning 19
....Slopes to Guide Generalization The remainder of this section describes a hybrid neural network learning algorithm for learning f . This algorithm is a special case of both the Tangent Prop algorithm [Simard et al. 1992] and the explanation based neural network learning (EBNN) algorithm [Mitchell and Thrun, 1993] . Here we will refer to it as EBNN. Suppose we are given a training set X, and an invariance network oe that has been trained using a collection of support sets Y . We are now interested in learning f . One could, of course, ignore the invariance network and the support sets altogether and ....
....slopes and values simultaneously, errors in this bias (incorrect slopes due to approximations in the learned invariance network) can be overturned by the observed training example values in X . The robustness of EBNN to errors in estimated slopes has been verified empirically in robot navigation [Mitchell and Thrun, 1993] and robot perception [O Sullivan et al. 1995] domains. 3 Example 3.1 The Domain: Object Recognition To illustrate the transfer of knowledge via the invariance network, we collected a database of 700 color camera images of seven different objects (100 images per object) as depicted in Fig. 2 ....
T.M. Mitchell and S. Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287--294, San Mateo, CA, 1993. Morgan Kaufmann.
....sets provide additional training examples for the internal representation. 3. 3 Explanation Based Neural Network Learning The last method described here uses the explanation based neural network learning algorithm (EBNN) which was originally proposed in the context of reinforcement learning [8, 17]. EBNN trains an artificial neural network, denoted by h : I Gamma [0; 1] just like Back Propagation. However, in addition to the target values given by the training set X, EBNN estimates the slopes (tangents) of the target function f n for each example in X . More specifically, training ....
T. M. Mitchell and S. Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287--294, San Mateo, CA, 1993. Morgan Kaufmann.
....following formula. ff = 1 Gamma ffi ffi max (2) Here ffi max denotes the maximum prediction error, which is used for normalization. This weighting scheme attempts to give accurate slopes a large weight in training, while ignoring inaccurate slopes. This heuristic weighting scheme, called LOB [Mitchell and Thrun, 1993] , is based on the heuristic assumption that the accuracy of the explanation s slopes are correlated to the accuracy of the explanation s predictions. This completes the description of the EBNN learning mechanism. To summarize, EBNN refines the target network using a combined inductive analytical ....
....instances. The dashed lines indicate average performance. In this experiment, the agent used well trained predictive action models as its domain theory. the fact that misleading slopes are identified and their influence weakened (cf. Eq. 2) In other experiments reported elsewhere [Thrun and Mitchell, 1993] it was demonstrated that EBNN will fail to learn control if the domain theory is poor and ff is kept fixed. These results also indicate that in cases where the domain theory is poor a pure analytical learner would be hopelessly lost. In the experiments reported here EBNN recovered from poor ....
Tom M. Mitchell and Sebastian B. Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287--294, San Mateo, CA, 1993. Morgan Kaufmann.
No context found.
Thomas M. Mitchell and Sebastian Thrun. Explanation based neural network learning for robot control. In Stephen J. Hanson, Jack Cowan, and Lee Giles, editors, Advances in Neural Information Processing Systems 5. Morgan Kaufmann, San Mateo, CA, to appear.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC