Results 1 -
6 of
6
Finding Structure in Reinforcement Learning
- Advances in Neural Information Processing Systems 7
, 1995
"... Reinforcement learning addresses the problem of learning to select actions in order to maximize one's performance in unknown environments. To scale reinforcement learning to complex real-world tasks, such as typically studied in AI, one must ultimately be able to discover the structure in the world, ..."
Abstract
-
Cited by 98 (4 self)
- Add to MetaCart
Reinforcement learning addresses the problem of learning to select actions in order to maximize one's performance in unknown environments. To scale reinforcement learning to complex real-world tasks, such as typically studied in AI, one must ultimately be able to discover the structure in the world, in order to abstract away the myriad of details and to operate in more tractable problem spaces. This paper presents the SKILLS algorithm. SKILLS discovers skills, which are partially defined action policies that arise in the context of multiple, related tasks. Skills collapse whole action sequences into single operators. They are learned by minimizing the compactness of action policies, using a description length argument on their representation. Empirical results in simple grid navigation tasks illustrate the successful discovery of structure in reinforcement learning. 1 Introduction Reinforcement learning comprises a family of incremental planning algorithms that construct reactive con...
The World-Wide-Mind: Draft Proposal
, 2001
"... In the first part of this paper, a change in methodology for the future of AI and Adaptive Behavior research is proposed. It is proposed that researchers construct their agent minds and their agent worlds as servers on the Internet. 3rd parties will use these servers as components in larger systems. ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
In the first part of this paper, a change in methodology for the future of AI and Adaptive Behavior research is proposed. It is proposed that researchers construct their agent minds and their agent worlds as servers on the Internet. 3rd parties will use these servers as components in larger systems. In this scheme, any user on the Internet will be able to (a) select multiple minds from different remote "mind servers", (b) select a remote "Action Selection server" to resolve the (inevitable) conflicts between these minds, and (c) run the resulting constructed "society of mind" in the world provided on another "world server". All this without necessarily having to consult with the server authors. This constructed society may now also be presented as just another primitive mind server, ready for reuse by others as a component in a larger system. From the current situation of isolated experiments we will move to a situation where not only can researchers use each other's agent worlds, but they can also use each other's agent minds as components in larger systems. Servers may call other servers, and it is expected that 3rd parties will continuously write wrappers and filters for existing mind servers, overriding and modifying their default behaviour (to produce new, co-existing mind servers). None of this necessarily means that the mind being used ever leaves its server (or that its insides are even made public). Hence the term, the "World-Wide-Mind" (WWM), referring to the fact that the mind may be physically distributed across the world, with parts of the mind at different remote servers. Part of the motivation for the WWM is that if the AI project is to be successful, it may be too big for any single laboratory to complete. So it will be necessary both to decentralise t...
Learning To Learn: Introduction
- In Learning To Learn
, 1996
"... Over the past three decades or so, research on machine learning and data mining has led to a wide variety of algorithms that learn general functions from experience. As machine learning is maturing, it has begun to make the successful transition from academic research to various practical applicatio ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Over the past three decades or so, research on machine learning and data mining has led to a wide variety of algorithms that learn general functions from experience. As machine learning is maturing, it has begun to make the successful transition from academic research to various practical applications. Generic techniques such as decision trees and artificial neural networks, for example, are now being used in various commercial and industrial applications (see e.g., [31, 75]). "Learning to learn" is an exciting new research direction within machine learning [14]. Similar to traditional machine learning algorithms, the methods described in this book induce general functions from experience. However, the book investigates algorithms that can change the way they generalise, ie practise the task of learning itself, and improve on it.
Transfer of Learned Knowledge in Life-Long Learning Agents
, 1997
"... Previous work has demonstrated that the performance of machine learning algorithms can be improved by exploiting various forms of knowledge, such as domain theories. More recently, it has been recognized that some forms of knowledge can in turn be learned -- in particular, action models and task-spe ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Previous work has demonstrated that the performance of machine learning algorithms can be improved by exploiting various forms of knowledge, such as domain theories. More recently, it has been recognized that some forms of knowledge can in turn be learned -- in particular, action models and task-specific internal representations. Using learned knowledge as a source of learning improvement can be particularly appropriate for agents that face many tasks. Over a long lifetime, an agent can amortize effort expended in learning knowledge by reducing the number of examples required to learn further tasks. In developing such a "lifelong learning" agent, a number of research issues arise, including: will an agent benefit from learned knowledge, can an agent exploit multiple sources of learned knowledge, how should the agent adapt as a new task arrives, how might the order of task arrival impact learning, and how can such an agent be built? I propose that an agent can be constructed which learn...
Goal Directed Adaptive Behavior in Second-Order Neural Networks: The MAXSON family of architectures
, 2000
"... ..."
Feudal Q-Learning
, 1995
"... One popular way of exorcising the daemon of dimensionality in dynamic programming is to consider spatial and temporal hierarchies for representing the value functions and policies. This paper develops a hierarchical method for Q- learning which is based on the familiar notion of a recursive feuda ..."
Abstract
- Add to MetaCart
One popular way of exorcising the daemon of dimensionality in dynamic programming is to consider spatial and temporal hierarchies for representing the value functions and policies. This paper develops a hierarchical method for Q- learning which is based on the familiar notion of a recursive feudal serfdom, with managers setting tasks and giving rewards and punishments to their juniors and in their turn receiving tasks and rewards and punishments from their superiors. We show how one such system performs in a navigation task, based on a manual division of state-space at successively coarser resolutions. Links with other hierarchical systems are discussed. 1 Introduction Many tasks for real and artificial systems can naturally be cast in hierarchical terms. Division for conquest is a common metaphor, and it is certainly a conventional way for human designers to cope with task complexity in everything from the organisation of large corporations to chip design. Biological control...

