| S. J. Russell. Execution architectures and compilation. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit, Michigan, 1989. |
....its actions have on the environment. pagoda builds an explicit, predictive world model that the planner uses to construct deliberate plans, but in general, the learner may construct any sort of knowledge usable by a planner, such as condition action rules (reactive strategies) or a neural network. Russell [1989] characterizes the forms of knowledge that an agent may have about the world along a continuum from declarative (e.g. predictive rules) to compiled (e.g. rules specifying the best action to take in a given situation) pagoda s world model would be classified by this model as purely declarative. ....
....more than just deciding whether to plan or act. For example, the metareasoner may control search so that only the most promising action sequences are explored, or it may decide to cache plan knowledge by compiling the learned world model into situation action rules to be applied in the future [Russell, 1989]. 9.5 Conclusions We have provided a model of learning in autonomous domains that integrates solutions to the problems of deciding what to learn, selecting learning biases, representing and learning probabilistic theories, and planning with learned probabilistic knowledge. The interactions among ....
Stuart J. Russell. Execution architectures and compilation. In IJCAI, pages 15--20, 1989.
....agent must surely, then, include the ability to adapt to novel situations and to learn new behaviours. Learning requires that an agent be able to make changes to its internal structure so as to improve some metric on its long term future performance according to some fixed performance criterion [Rus89] In this particular sense TouringMachines can at present be considered pre intelligent: in response to environmental change, TouringMachines can, by changing their intentions, dynamically alter their internal control structure; however, they do so without regard to any metric on their long term ....
....in agents can also be classified as performance learning. For example, chunking to compile impasse resolution procedures in SOAR [LR90] derivational analogy to acquire domain specific control rules in PRODIGY [CEG 91] as well as caching of stimulus response rules in Theo [Mit90] Russell [Rus89] also proposes a decision theoretic framework for knowledge compilation within which each of the above approaches can be formally described. Besides the potential for making agents more reactive and thus more run time efficient performance learning can help to overcome the restriction ....
Stuart J. Russell. Execution architectures and compilation. In Proceedings International Joint Conference on Artificial Intelligence, pages 15--20, 1989.
....per transition costs, to per transition outcomes, called rewards. Rewards can be either desirable (positive) or undesirable (negative) where the latter are essentially the same as costs. The agent s objective is to choose actions so as to maximize the total reward received from the world (cf. Russell, 1989). For example, a robot meant to find and collect soda cans might be given a positive reward for depositing a can in the recycling bin, a negative reward for colliding with anything or if its battery runs too low, and zero reward at all other times. In a navigation problem, positive reward might be ....
Russell, S. J. (1989) Execution architectures and compilation.
....To reduce the complexity and time needed for decision making in time constrained situation, we compile the results of deliberative decision making into a set of reactive condition action rules with numerous machine learning algorithms. An autonomous agent can use the compiled knowledge [ Russell, 1989; Zilberstein, 1995 ] and either eliminate the deliberative decision making all together, or constrain the number of alternative actions considered by excluding the ones that are likely to be suboptimal. We propose an adaptive and deliberative agent (ADA) architecture, as consisting of compiled ....
S. J. Russell. Execution architectures and compilation. In Proceedings of the 11th International Joint Conference on Artificial Intelligence, pages 15--20, Detroit, Michigan, August 1989.
....and new situation. The agent s objective is to choose actions so as to maximize the total reward it receives in the long term. 1 This problem formulation has been used in studies of reinforcement learning for many years and is also being used in studies of planning and reactive systems (e.g. Russell, 1989). Although somewhat unfamiliar, the reward maximization problem is easily mapped onto most problems of interest. 1 Most systems actually slightly discount delayed reward relative to immediate reward. Agent World Reward Situation State Action Figure 1: The Problem Formulation Used in Dyna. The ....
Russell, S. J. (1989) Execution architectures and compilation.
....between the possible inefficiencies resulting from using approximate values of expected utilities and the value of time needed for a more accurate model. A very attractive method of dealing with the complexities of an all encompassing formalism is compilation. It has been advocated before in [12, 15, 13] that decision theory can be made to justify heuristic rules of behavior under uncertainty and be compiled in a number of different ways into condition action rules, action utility rules, and so on. These suggestions are quite applicable to the formalisms of multiagent interaction and ....
S. J. Russell. Execution architectures and compilation. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pages 15--20, Detroit, Michigan, August 1989.
....detectable features of their environment. The ultimate feasibility of this approach is an empirical question, but there are many researchers, including me, who are skeptical and who believe that knowledge compilation techniques will provide only one piece of the intelligent behavior puzzle [14, 20, 25, 43, 57, 60]. The reasons for skepticism are wide ranging. There are practical, engineering concerns: obviating planning puts an enormously heavy burden on a system designer. There are philosophical concerns: accounts of rationality and responsibility depend upon conceiving of behavior as resulting from ....
S. J. Russell. Execution architectures and compilation. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pages 15--20, Detroit, MI, 1989.
....The network does not need long paths from inputs (sensors) to outputs (actuators) Any computation that is capable of being done is done in a very short time span. There have been other approaches which address a similar time bounded computation issue, namely the bounded rationality approach [Russell 89] Those approaches try to squeeze a traditional Artificial Intelligence system into a bounded amount of computation. With the new approach we tend to come from the other direction, we start with very little computation and build up the amount, while staying 24 The tasks carried out by this ....
"Execution Architectures and Compilation ", Stuart J. Russell, Proceedings IJCAI--89, Detroit, MI, 1989, 15--20.
....between the desire to tackle realistic tasks, utilizing integrated agent architectures, and the desire to maintain reasonable progress in machine learning. A solution pursued by some (e.g. Al Badr and Hanks, 1991, Carbonell and Hood, 1986, Philips et al. 1991, Pollack and Ringuette, 1990, Russell, 1989, Vere and Bickmore, 1990 ] is agent testbeds that simulate robotic environments with varying degrees of fidelity. Simulated environments have the advantage of providing relatively low cost arenas for research. However, even simulated environments can be difficult to build [ Carbonell and Hood, ....
Russell, Stuart 1989. execution architectures and compilation. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence.
....the world. The world represents the task to be solved; prototypically it is the robot s external environment. The world receives actions from the policy and produces a next state output and a reward output. The overall task is defined as maximizing the long term average reward per time step (cf. Russell, 1989). The architecture also includes an explicit world model. The world model is intended to mimic the one step input output behavior of the real world. Finally, the Dyna PI architecture includes an evaluation function that rapidly maps states to values, much as the policy rapidly maps states to ....
Russell, S. J. (1989) Execution architectures and compilation.
....the best external action, the system may be considered a reactive agent. The control structure of a production system becomes less reactive and more deliberative as matching cost grows, multiple rule matching is allowed, and certain policies are used to resolve conflicts between rules. Russell [1989] presents a uniform view of agent deliberation that identifies six types of knowledge that agents can use. These classes of knowledge range from fully declarative to fully compiled representations. Figure 3.2 [Russell, 1989] shows the six types of knowledge, denoted by: # and # . Type rules ....
....allowed, and certain policies are used to resolve conflicts between rules. Russell [1989] presents a uniform view of agent deliberation that identifies six types of knowledge that agents can use. These classes of knowledge range from fully declarative to fully compiled representations. Figure 3. 2 [Russell, 1989] shows the six types of knowledge, denoted by: # and # . Type rules specify information that can be deduced about the current state. Type # rules specify information about the results of actions and their effect on the current state. Type rules specify information regarding the utility ....
S. J. Russell. Execution architectures and compilation. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit, Michigan, 1989.
....situations. For the above reasons, among others, two important topics in the ralph project are inductive learning of meta level policies and compilation of reasoning. Meta level learning is discussed briefly in Section 5.4. 1, and at greater length in [50, 51] On compilation of decision making see [41]. We now turn to the topic of how an agent can select its computations optimally without knowing their outcome the topic of rational metareasoning. 3.1 Rational metareasoning The construction of a system capable of rational metareasoning rests on two basic principles: 1. Computations are to ....
....are isomorphic to those for computations rather than ordinary external actions. the SOAR system [34] however, the basic deliberation mode is goal directed search. We intend to construct a problem solving architecture in which decision theoretic deliberation and its various possible compilations [41] are the basic modes of computation, and in which metareasoning is carried out in the principled fashion outlined above, rather than through hand generated condition action rules. One can also consider the possibility of applying these ideas to control search in theorem provers. However, in order ....
Russell, S. J. (1989) Execution architectures and compilation. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit, MI: Morgan Kaufmann.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC