| Maclin, R., Shavlik, J.W.: Incorporating advice into agents that learn from reinforcements. In: Proc. of 12th National Conference on Artificial Intelligence. (1994) 694--699 |
.... advice from an external source is not new though; McCarthy made the suggestion nearly 45 years ago [9] However, the number of systems that have done just this are limited and usually speci c to a particular problem, e.g. integrating advice in Q learning to increase an agent s overall reward [7]. There has also been work that attempts to address how to handle con icting advice [5] This is one of the very same issues soccer agents must address when accepting advice from other agents. 5] o ers four criteria to determine how to resolve con icts: Speci city more constraining rules ....
R. Maclin and J.W. Shavlik. Incorporating advice into agents that learn from reinforcements. In Proc. of AAAI-1994, pages 694-699, 1994.
.... advice from an external source is not new though; McCarthy made the suggestion nearly 45 years ago [9] However, the number of systems that have done just this are limited and usually specific to a particular problem, e.g. integrating advice in Q learning to increase an agent s overall reward [7]. There has also been work that attempts to address how to handle conflicting advice [5] This is one of the very same issues soccer agents must address when accepting advice from other agents. 5] o#ers four criteria to determine how to resolve conflicts: Specificity more constraining ....
R. Maclin and J.W. Shavlik. Incorporating advice into agents that learn from reinforcements. In Proc. of AAAI-1994, pages 694--699, 1994.
.... advice from an external source is not new though; McCarthy made the suggestion nearly 45 years ago [9] However, the number of systems that have done just this are limited and usually speci c to a particular problem, e.g. integrating advice in Q learning to increase an agent s overall reward [7]. There has been work done however that attempts to address how to handle con icting advice [5] This is one of the very same issues soccer agents must address when accepting advice from other agents. 5] o ers four criteria to determine how to resolve con icts: Speci city more constraining ....
R. Maclin and J.W. Shavlik. Incorporating advice into agents that learn from reinforcements. In Proc. of AAAI-1994, pages 694-699, 1994.
....must be explored sufficiently. On the other hand, experience gained during the learning process also has to be considered for action selection in order to minimize costs of learning. Empirical studies have proven the acceleration of the learning process using advices from an external observer e.g. [9]. Integrating expert knowledge into the exploration step of the reinforcement learning algorithm, the following enhancement of WATKINS [14] Q Learning algorithm is proposed: Step 1: For each s a , initialize table entry ( Qsa to zero Step 2: Observe the current state s Step 3: Do ....
Maclin, R., Shavlik, J.W.: Incorporating Advice into Agents that learn from Reinforcements, in: Proceedings of the Twelfth National Conference on Artificial Intelligence, Vol. 1, Seattle, USA, Juli--August, 1994, pp. 694-699
....to Q table and its initial rewards of each actions are set as the following way: Q(s t ; a i ) r t ( i = t ) 0 ( i 6= t ) 11) If r t = 0, no processes are done and quit. 4 Simulation and Evaluation In this chapter, we empirically evaluate of FCQL for an arti cial environment (Figure 3) [4], which has the following reasons: 1. It is a discrete Markov environment. 2. It has DNF features. Learner Enemy Obstacle Empty Target : Fig. 3. Test Environments. The maze is a grid world whose size is 7 7 blocks and there are 5 types of objects in the maze: an object ....
Maclin, R. and Shavlik, J. W.: Incorporating Advice into Agents that Learn from Reinforcements, Proc. of AAAI-1994 , pp. 694-699 (1994).
....may begin again. Weight compilation approaches of this kind have been used to encode various forms of sentential rules [ McMillan et al. 1991; Towell and Shavlik, 1994 ] the transitions of finite state automata [ Giles and Omlin, 1993 ] and advice for an agent in an artificial environment [ Maclin and Shavlik, 1994 ] Unfortunately, none of these models provide a connectionist explanation for how instructions are compiled into the network. Also missing is a connectionist mechanism for the rule extraction process which is needed to reset the semantics of weight values. These hybrid models depend on ....
Richard Maclin and Jude W. Shavlik. Incorporating advice into agents that learn from reinforcements. In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI-94), pages 694--699, Menlo Park, 1994. AAAI Press.
.... successfully used to encode propositional logic rules (Towell and Shavlik, 1994) fuzzy classification rules (Tresp et al. 1993) simple mapping rules (McMillan et al. 1991) the transitions of finite state automata (Giles and Omlin, 1993) and advice for an agent in an artificial environment (Maclin and Shavlik, 1994). Unfortunately, none of these models provide a connectionist explanation for how instructions are compiled into the network. Also missing is a connectionist mechanism for the rule extraction process which is needed to reset the semantics of weight values. Both of these processes, compilation ....
Maclin, R. and Shavlik, J. W. (1994). Incorporating advice into agents that learn from reinforcements. In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI-94), pages 694--699, Menlo Park. AAAI Press.
.... form of instruction does the training agent provide Related work has examined training agents that provide the learner with: actions that the learner should perform (Clouse Utgoff, 1992; Lin, 1992) the exact state values (Utgoff Clouse, 1991) and if then rules (Gordon Subramanian, 1994; Maclin Shavlik, 1994). In our approach, the automated trainer instructs the learning agent by giving actions that the learner should perform. We choose this form of advice for two main reasons. Firstly, individual actions are an inherent form of instruction for an automated training agent to provide: by building an ....
....perform the task at a moderate level of expertise, so that the instruction it provides to the learner is informative. ffl When does the trainer provide instruction Many trainer augmented systems allow instruction to be provided at the whim of a human trainer (Lin, 1991; Clouse Utgoff, 1992; Maclin Shavlik, 1994). Although these systems exhibit good performance, they do not address the issue of when the instruction should be provided, because they allow a human to guide the system in an unprincipled manner. In order to evaluate policies for deciding when to instruct the learner, one needs to examine ....
[Article contains additional citation context not shown here]
Maclin, R., & Shavlik, J. W. (1994). Incorporating advice into agents that learn from reinforcements.
....representation, to facilitate correspondence with the bottom level and to encourage uniformity and integration, we chose to use a localist connectionist model instead. Basically, we connect the nodes representing the conditions of a rule to the node representing the conclusion (Sun 1992, Fu 1991, Maclin and Shavlik 1994, Towell and Shavlik 1993) That is, we directly translate the structure of a rule set to that of a network. For more complex rule forms including predicate rules and variable binding, see Sun (1992) We devised a novel rule learning algorithm based on neural reinforcement learning. The basic idea ....
R. Maclin and J. Shavlik, (1994). Incorporating advice into agents that learn from reinforcements. Proc.of AAAI-94.
....payoff, is only weakly informative; certainly not as instructive as supervised training. In recognition of this deficiency of reinforcement learning, several researchers have recently studied systems in which a training agent is added to the learning scenario (Clouse Utgoff, 1992; Lin, 1992; Maclin Shavlik, 1994; Gordon Subramanian, 1994; Clouse, 1995) In such systems, rather than relying solely on the simple signals provided, the learning agent also has access to supervised instruction. While it is not surprising that a learning agent employing a reinforcement learning method will be aided by ....
....State Reinforcement Instruction Figure 1. Trainer augmented reinforcement learning as if it itself had performed the training sequences. In another system (Clouse Utgoff, 1992) a human training agent interacts with the learner, occasionally providing actions that the learner then performs. Maclin and Shavlik (1994) employ if then rules, compiling the trainer s instruction directly into the learner s policy via knowledge based neural network techniques. Similarly, Gordon and Subramanian (1994) rely on if then rules, which are converted into operational rules that become part of a population that is refined ....
[Article contains additional citation context not shown here]
Maclin, R., & Shavlik, J. W. (1994). Incorporating advice into agents that learn from reinforcements.
....each other) which is unlike any of the existing models. In addition, most of the existing learning models explore mainly top down learning (including advice taking) in which externally given declarative knowledge is turned into procedural knowledge through practice, such as Gelfand et al. (1989) Maclin and Shavlik (1994), and Anderson (1983) Instead, Clarion explores bottom up learning, to demonstrate how conceptual symbolic knowledge can emerge in interacting with the world through the mediation of subconceptual procedural knowledge. Let us compare Clarion also with connectionist rule extraction algorithms. Fu ....
R. Maclin and J. Shavlik, (1994). Incorporating advice into agents that learn from reinforcements.
....each other) which is unlike any of the existing models. In addition, most of the existing learning models explore mainly top down learning (including advice taking) in which externally given declarative knowledge is turned into procedural knowledge through practice, such as Gelfand et al. (1989) Maclin and Shavlik (1994), and Anderson (1983) Instead, Clarion explores bottom up learning, to demonstrate how conceptual symbolic knowledge can emerge in interacting with the world through the mediation of subconceptual subsymbolic procedural knowledge. Rule Extraction. Let us compare Clarion also with connectionist ....
.... and declarative knowledge; Sun et al. 1996, 1997) In addition, these models used some domain specific techniques (thus not directly applicable to minefield navigation) Clearly, there are a variety of techniques for speeding up or otherwise helping Q learning, such as Sutton (1990) Lin (1992) Maclin and Shavlik (1994), McCallum (1996) Moore and Atkeson (1994) Singh (1994) Zhang and Dietterich (1995) and Singh and Sutton (1996) Such techniques include reuse of previous experience (Sutton 1990, Lin 1992) gradual enhancement of state spaces (McCallum 1996, Moore and Atkeson 1994) hierarchical learning ....
[Article contains additional citation context not shown here]
R. Maclin and J. Shavlik, (1994). Incorporating advice into agents that learn from reinforcements. Proc.of AAAI-94. Morgan Kaufmann, San Meteo, CA.
....the minefield: avoids mines and goes in the general direction of the target. The estimated success rate (the lower bound) of the plan was 62 as opposed to the success rate of the original Q learner of 81 IV. Discussions Compared with Kushmerick et al. (1995) Dearden and Boutilier (1996) Maclin and Shavlik (1994) and other work of planning with DP RL, the difference is obvious: our approach is bottom up (turning Q values from DP RL into explicit planning rules) while their approaches were topdown (i.e. using given rules and turning them into MDP for DP RL; see section 1) Top down approaches require much ....
R. Maclin and J. Shavlik, (1994). Incorporating advice into agents that learn from reinforcements. Proc.of AAAI-94.
.... learning models explore mainly top down learning (including advice taking) in which externally given declarative knowledge is turned into procedural knowledge through practice, such as Gelfand et al. (1989) Anderson (1983, 1993) Anderson and Lebiere (1998) Oliver and Schneider (1989) and Maclin and Shavlik (1994). Instead, Clarion explores bottom up learning, to demonstrate how conceptual symbolic knowledge can emerge in interacting with the world through the mediation of subconceptual procedural knowledge. 4.5 Cognitive Modeling The major difference between Clarion and other cognitive architectures is ....
R. Maclin and J. Shavlik, (1994). Incorporating advice into agents that learn from reinforcements.
No context found.
Maclin, R. and Shavlik, J. (AAAI-94). Incorporating advice into agents that learn from reinforcements. In Twelfth National Conference on Arti cial Intelligence (American Association of Arti cial Intelligence).
....from their need for large numbers of training episodes. Several methods for speeding up reinforcement learning have been proposed; one promising approach is to design a learner that can also accept advice from an external observer (Clouse Utgoff, 1992; Gordon Subramanian, 1994; Lin, 1992; Maclin Shavlik, 1994). Figure 1 shows the general structure of a reinforcement learner, augmented (in bold) with an observer that provides advice. We present and evaluate a connectionist approach in which agents learn from both experience and instruction. Our approach produces agents that significantly outperform ....
....a decade ago, Mostow (1982) developed a program that accepted and operationalized high level advice about how to better play the card game Hearts. Recently, after a decade long lull, there has been a growing amount of research on advice taking (Gordon Subramanian, 1994; Huffman Laird, 1993; Maclin Shavlik, 1994; Noelle Cottrell, 1994) For example, Gordon and Subramanian (1994) created a system that deductively compiles high level advice into concrete actions, which are then refined using genetic algorithms. Several characteristics of our approach to providing advice are particularly interesting. One, ....
[Article contains additional citation context not shown here]
Maclin, R., & Shavlik, J. (1994). Incorporating advice into agents that learn from reinforcements.
....provide to an agent learning to play a video game. We will use it to illustrate the process of integrating advice into a neural network. The left column contains advice in our programming language, and the right shows the effects of the advice. A grammar for our advice language appears elsewhere (Maclin Shavlik 1994). We have made three extensions to the standard kbann algorithm: i) we allow advice that contains multi step plans; ii) advice can contain loops; iii) advice can refer to previously defined terms. In all three cases incorporating advice involves adding hidden units representing the advice ....
....advice. The key to translating this construct is that there are two ways to invoke the twostep plan. The plan executes when the when condition 1 A unit recognizing this concept, enemy near and west, is creating using a technique similar that in Berenji and Khedkar (1992) for more details see Maclin and Shavlik (1994). Surrounded MoveEast PushEast OKtoPushEast Enemy Near Other Inputs 1 PushEast 1 MoveEast S1 S2 1 S2 1 S1 C D E Figure 4: Translation of the second piece of advice. Dotted lines show negative weights. As with all translations, the units shown are added to the existing network. MoveEast Old ....
[Article contains additional citation context not shown here]
Maclin, R., & Shavlik, J. 1994. Incorporating advice into agents that learn from reinforcements. Technical Report 1227, CS Dept., Univ. of Wisconsin-Madison.
....it useful to create a set of knowledge based networks whose outputs are averaged (weighted) when 3 categorizing new examples [13] we search for a set of highly accurate networks whose errors are not highly correlated. Recently, we extended our approach to the reinforcement learning paradigm [5, 7, 8]. 1 The approach we developed allows a connectionist reinforcement learner to accept advice, provided at any time during the learning process, by an external observer. In this approach, the advice giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple ....
R. Maclin and J. Shavlik, "Incorporating advice into agents that learn from reinforcements," in Proceedings of the Twelfth National Conference on Artificial Intelligence, (Seattle, WA), pp. 694--699, AAAI/MIT Press, 1994.
No context found.
Maclin, R., Shavlik, J.W.: Incorporating advice into agents that learn from reinforcements. In: Proc. of 12th National Conference on Artificial Intelligence. (1994) 694--699
No context found.
R. Maclin and J. Shavlik. Incorporating advice into agents that learn from reinforcement. In Proc. of the Twelfth National Conf. on Artificial Intelligence (AAAI-94). AAAI Press, 1994.
No context found.
R. Maclin and J. Shavlik. Incorporating advice into agents that learn from reinforcement. In Proc. of the Twelfth National Conf. on Artificial Intelligence (AAAI-94). AAAI Press, 1994.
No context found.
R. Maclin and J. Shavlik, (1994). Incorporating advice into agents that learn from reinforcements. Proc. of the National Conference on Articial Intelligence (AAAI-94). Morgan Kaufmann, San Meteo, CA.
No context found.
R. Maclin and J. Shavlik, (1994). Incorporating advice into agents that learn from reinforcements. Proc.of AAAI-94. Morgan Kaufmann, San Meteo, CA.
No context found.
R. Maclin and J. Shavlik, (1994). Incorporating advice into agents that learn from reinforcements. Proc.of AAAI-94.
No context found.
R. Maclin and J. Shavlik, (1994). Incorporating advice into agents that learn from reinforcements. Proc. of AAAI-94. Morgan Kaufmann, San Meteo, CA.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC