Results 1 - 10
of
183
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
- Artificial Intelligence
, 1999
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We ..."
Abstract
-
Cited by 342 (22 self)
- Add to MetaCart
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options---closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning.
Integrating Planning and Learning: The PRODIGY Architecture
- Journal of Experimental and Theoretical Artificial Intelligence
, 1995
"... are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, ..."
Abstract
-
Cited by 208 (75 self)
- Add to MetaCart
are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements,
How To Do the Right Thing
- Connection Science Journal
, 1989
"... This paper presents a novel approach to the problem of action selection for an autonomous agent. An agent is viewed as a collection of com- petence modules. Action selection is modeled as an emergent property of an activation/inhibition dynamics among these modules. A con- crete action selection ..."
Abstract
-
Cited by 179 (0 self)
- Add to MetaCart
This paper presents a novel approach to the problem of action selection for an autonomous agent. An agent is viewed as a collection of com- petence modules. Action selection is modeled as an emergent property of an activation/inhibition dynamics among these modules. A con- crete action selection algorithm is presented and a detailed account of the results is given. This algorithm combines characteristics of both traditional planners and reactive systems: it produces fast and robust activity in a tight interaction loop with the environment, while at the same time allowing for some prediction and planning to take place. It provides global parameters, which one can use to tune the action selection behavior to the characteristics of the task environment. As such one can smoothly trade off goal-orientedness for situation-orientedness, bias towards ongoing plans (inertia) for adaptivity, thoughtfulness for speed, and adjust its sensitivity to goal conflicts.
Automatically Generating Abstractions for Planning
- Artificial Intelligence
, 1994
"... This article presents a completely automated approach to generating abstractions for planning. The abstractions are generated using a tractable, domain-independent algorithm whose only input is the definition of a problem to be solved and whose output is an abstraction hierarchy that is tailored ..."
Abstract
-
Cited by 156 (3 self)
- Add to MetaCart
This article presents a completely automated approach to generating abstractions for planning. The abstractions are generated using a tractable, domain-independent algorithm whose only input is the definition of a problem to be solved and whose output is an abstraction hierarchy that is tailored to the particular problem. The algorithm generates abstraction hierarchies by dropping literals from the original problem definition. It forms abstractions that satisfy the ordered monotonicity property, which guarantees that the structure of an abstract solution is not changed in the process of refining it. The algorithm for generating abstractions is implemented in a system called alpine, which generates abstractions for a hierarchical version of the prodigy problem solver. The abstractions generated by alpine are tested in multiple domains on large problem sets and are shown to produce shorter solutions with significantly less search than planning without using abstraction. 1 1 ...
prodigy/analogy: Analogical Reasoning in General Problem Solving
, 1994
"... This paper describes the integration of analogical reasoning into general problem solving as a method of learning at the strategy level to solve problems more effectively. The method based on derivational analogy has been fully implemented in prodigy/analogy and proven empirically to be amenable t ..."
Abstract
-
Cited by 134 (17 self)
- Add to MetaCart
This paper describes the integration of analogical reasoning into general problem solving as a method of learning at the strategy level to solve problems more effectively. The method based on derivational analogy has been fully implemented in prodigy/analogy and proven empirically to be amenable to scaling up both in terms of domain and problem complexity. prodigy/analogy addresses a set of challenging problems, namely: how to accumulate episodic problem solving experience, cases, how to define and decide when two problem solving situations are similar, how to organize a large library of planning cases so that it may be efficiently retrieved, and finally how to successfully transfer chains of problem solving decisions from past experience to new problem solving situations when only a partial match exists among corresponding problems. The paper discusses the generation and replay of the problem solving cases and we illustrate the algorithms with examples. We present briefly the librar...
Derivational Analogy in prodigy: Automating Case Acquisition
- Storage, and Utilization. Machine Learning
, 1993
"... Abstract. Expertise consists of rapid selection and application of compiled experience. Robust reasoning, however, requires adaptation to new contingencies and intelligent modification of past experience. And novel or creative reasoning, by its real nature, necessitates general problem-solving abili ..."
Abstract
-
Cited by 99 (14 self)
- Add to MetaCart
Abstract. Expertise consists of rapid selection and application of compiled experience. Robust reasoning, however, requires adaptation to new contingencies and intelligent modification of past experience. And novel or creative reasoning, by its real nature, necessitates general problem-solving abilities unconstrained by past behavior. This article presents a comprehensive computational model of analogical (case-based) reasoning that transitions smoothly between case replay, case adaptation, and general problem solving, exploiting and modifying past experience when available and resorting to general problem-solving methods when required. Learning occurs by accumulation of new cases, especially in situations that required extensive problem solving, and by tuning the indexing structure of the memory model to retrieve progressively more appropriate cases. The derivational replay mechanism is discussed in some detail, and extensive results of the first full implementation are presented. These results show up to a large performance improvement in a simple transportation domain for structurally similar problems, and smaller improvements when less strict similarity metrics are used for problems that share partial structure in a process-job planning domain and in an extended version of the STRIPS robot domain.
Acquiring Search-Control Knowledge via Static Analysis
- Artificial Intelligence
, 1993
"... Explanation-Based Learning (EBL) is a widely-used technique for acquiring searchcontrol knowledge. Recently, Prieditis, van Harmelen, and Bundy pointed to the similarity between Partial Evaluation (PE) and EBL. However, EBL utilizes training examples whereas PE does not. It is natural to inquire, th ..."
Abstract
-
Cited by 85 (2 self)
- Add to MetaCart
Explanation-Based Learning (EBL) is a widely-used technique for acquiring searchcontrol knowledge. Recently, Prieditis, van Harmelen, and Bundy pointed to the similarity between Partial Evaluation (PE) and EBL. However, EBL utilizes training examples whereas PE does not. It is natural to inquire, therefore, whether PE can be used to acquire searchcontrol knowledge, and if so at what cost? This paper answers these questions by means of a case study comparing prodigy/ebl, a state-of-the-art EBL system, and static, a PEbased analyzer of problem-space definitions. When tested in prodigy/ebl's benchmark problem spaces, static generated search-control knowledge that was up to three times as effective as the knowledge learned by prodigy/ebl, and did so from twenty-six to seventyseven times faster. The paper describes static's algorithms, compares its performance to prodigy/ebl's, noting when static's superior performance will scale up and when it will not. The paper concludes with several le...
Multi-time Models for Temporally Abstract Planning
- In Advances in Neural Information Processing Systems 10
, 1997
"... Planning Doina Precup, Richard S. Sutton University of Massachusetts Amherst, MA 01003 fdprecupjrichg@cs.umass.edu Abstract Planning and learning at multiple levels of temporal abstraction is a key problem for artificial intelligence. In this paper we summarize an approach to this problem ba ..."
Abstract
-
Cited by 72 (8 self)
- Add to MetaCart
Planning Doina Precup, Richard S. Sutton University of Massachusetts Amherst, MA 01003 fdprecupjrichg@cs.umass.edu Abstract Planning and learning at multiple levels of temporal abstraction is a key problem for artificial intelligence. In this paper we summarize an approach to this problem based on the mathematical framework of Markov decision processes and reinforcement learning. Current model-based reinforcement learning is based on one-step models that cannot represent common-sense higher-level actions, such as going to lunch, grasping an object, or flying to Denver. This paper generalizes prior work on temporally abstract models [Sutton, 1995] and extends it from the prediction setting to include actions, control, and planning. We introduce a more general form of temporally abstract model, the multi-time model, and establish its suitability for planning and learning by virtue of its relationship to the Bellman equations. This paper summarizes the theoretical framewo...
Rationality and intelligence
- Artificial Intelligence
, 1997
"... The long-term goal of our field is the creation and understanding of intelligence. Productive research in AI, both practical and theoretical, benefits from a notion of intelligence that is precise enough to allow the cumulative development of robust systems and general results. This paper outlines a ..."
Abstract
-
Cited by 69 (1 self)
- Add to MetaCart
The long-term goal of our field is the creation and understanding of intelligence. Productive research in AI, both practical and theoretical, benefits from a notion of intelligence that is precise enough to allow the cumulative development of robust systems and general results. This paper outlines a gradual evolution in our formal conception of intelligence that brings it closer to our informal conception and simultaneously reduces the gap between theory and practice. 1 Artificial Intelligence AI is a field in which the ultimate goal has often been somewhat ill-defined and subject to dispute. Some researchers aim to emulate human cognition, others aim at the creation of
Lazy Incremental Learning of Control Knowledge for Efficiently Obtaining Quality Plans
- AI Review Journal. Special Issue on Lazy Learning
, 1996
"... General-purpose generative planners use domain-independent search heuristics to generate solutions for problems in a variety of domains. However, in some situations these heuristics force the planner to perform inefficiently or obtain solutions of poor quality. Learning from experience can help t ..."
Abstract
-
Cited by 62 (27 self)
- Add to MetaCart
General-purpose generative planners use domain-independent search heuristics to generate solutions for problems in a variety of domains. However, in some situations these heuristics force the planner to perform inefficiently or obtain solutions of poor quality. Learning from experience can help to identify the particular situations for which the domain-independent heuristics need to be overridden. Most of the past learning approaches are fully deductive and eagerly acquire correct control knowledge from a necessarily complete domain theory and a few examples to focus their scope.

