Results 1  10
of
44
Cooperative MultiAgent Learning: The State of the Art
 Autonomous Agents and MultiAgent Systems
, 2005
"... Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract

Cited by 181 (8 self)
 Add to MetaCart
(Show Context)
Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multiagent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multiagent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multiagent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multiagent learning problem domains, and a list of multiagent learning resources. 1
Formal Theory of Creativity, Fun, and Intrinsic Motivation (19902010)
"... The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditio ..."
Abstract

Cited by 73 (16 self)
 Add to MetaCart
(Show Context)
The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditional field of active learning, and is related to old but less formal ideas in aesthetics theory and developmental psychology. It has been argued that the theory explains many essential aspects of intelligence including autonomous development, science, art, music, humor. This overview first describes theoretically optimal (but not necessarily practical) ways of implementing the basic computational principles on exploratory, intrinsically motivated agents or robots, encouraging them to provoke event sequences exhibiting previously unknown but learnable algorithmic regularities. Emphasis is put on the importance of limited computational resources for online prediction and compression. Discrete and continuous time formulations are given. Previous practical but nonoptimal implementations (1991, 1995, 19972002) are reviewed, as well as several recent variants by others (2005). A simplified typology addresses current confusion concerning the precise nature of intrinsic motivation.
Optimal Ordered Problem Solver
, 2002
"... We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, eciently searching not only the ..."
Abstract

Cited by 70 (21 self)
 Add to MetaCart
(Show Context)
We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, eciently searching not only the space of domainspecific algorithms, but also the space of search algorithms. Essentially we extend the principles of optimal nonincremental universal search to build an incremental universal learner that is able to improve itself through experience.
Exploring the Predictable
, 2002
"... Details of complex event sequences are often not predictable, but their reduced abstract representations are. I study an embedded active learner that can limit its predictions to almost arbitrary computable aspects of spatiotemporal events. It constructs probabilistic algorithms that (1) control in ..."
Abstract

Cited by 33 (13 self)
 Add to MetaCart
Details of complex event sequences are often not predictable, but their reduced abstract representations are. I study an embedded active learner that can limit its predictions to almost arbitrary computable aspects of spatiotemporal events. It constructs probabilistic algorithms that (1) control interaction with the world, (2) map event sequences to abstract internal representations (IRs), (3) predict IRs from IRs computed earlier. Its goal is to create novel algorithms generating IRs useful for correct IR predictions, without wasting time on those learned before. This requires an adaptive novelty measure which is implemented by a coevolutionary scheme involving two competing modules collectively designing (initially random) algorithms representing experiments. Using special instructions, the modules can bet on the outcome of IR predictions computed by algorithms they have agreed upon. If their opinions dier then the system checks who's right, punishes the loser (the surprised one), and rewards the winner. An evolutionary or reinforcement learning algorithm forces each module to maximize reward. This motivates both modules to lure each other into agreeing upon experiments involving predictions that surprise it. Since each module essentially can veto experiments it does not consider profitable, the system is motivated to focus on those computable aspects of the environment where both modules still have confident but different opinions. Once both share the same opinion on a particular issue (via the loser's learning process, e.g., the winner is simply copied onto the loser), the winner loses a source of reward  an incentive to shift the focus of interest onto novel experiments. My simulations include an example where surprisegeneration of this kind helps to speed up ...
A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks
 Connection Science
, 1989
"... Most known learning algorithms for dynamic neural networks in nonstationary environments need global computations to perform credit assignment. These algorithms either are not local in time or not local in space. Those algorithms which are local in both time and space usually can not deal sensibly ..."
Abstract

Cited by 32 (20 self)
 Add to MetaCart
(Show Context)
Most known learning algorithms for dynamic neural networks in nonstationary environments need global computations to perform credit assignment. These algorithms either are not local in time or not local in space. Those algorithms which are local in both time and space usually can not deal sensibly with `hidden units'. In contrast, as far as we can judge by now, learning rules in biological systems with many `hidden units' are local in both space and time. In this paper we propose a parallel online learning algorithm which performs local computations only, yet still is designed to deal with hidden units and with units whose past activations are `hidden in time'. The approach is inspired by Holland's idea of the bucket brigade for classifier systems, which is transformed to run on a neural network with fixed topology. The result is a feedforward or recurrent `neural' dissipative system which is consuming `weightsubstance' and permanently trying to distribute this substance onto its co...
Metagenetic programming: Coevolving the operators of variation
, 1998
"... The standard Genetic Programming approach is augmented by coevolvingthe genetic operators. To do this the operators are coded as trees of indefinite length. In order for this technique to work, the language that the operators are defined in must be such that it preserves the variation in the base p ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
(Show Context)
The standard Genetic Programming approach is augmented by coevolvingthe genetic operators. To do this the operators are coded as trees of indefinite length. In order for this technique to work, the language that the operators are defined in must be such that it preserves the variation in the base population. This technique can varied by adding further populations of operators and changing which populations act as operators for others, including itself, thus to provide a framework for a whole set of augmented GP techniques. The technique is tested on the parity problem. The pros and cons of the technique are discussed. Key Words: genetic programming, automatic programming, genetic operators, coevolution 1.
CFSC: A Package of Domain Independent Subroutines for Implementing Classifier Systems in Arbitrary, UserDefined Environments
, 1988
"... This document describes the CFSC system, a package of subroutines (and data structures) that can be used to implement learning classifier systems for arbitrary, userdefined taskdomains /environments. The CFSC subroutines implement the core, domainindependent parts of a classifier system, includi ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
This document describes the CFSC system, a package of subroutines (and data structures) that can be used to implement learning classifier systems for arbitrary, userdefined taskdomains /environments. The CFSC subroutines implement the core, domainindependent parts of a classifier system, including routines to implement the following steps of the "majorcycle" of a classifier system:
Gödel Machines: SelfReferential Universal Problem Solvers Making Provably Optimal SelfImprovements
, 2003
"... An old dream of computer scientists is to build an optimally efficient universal problem solver. We show how to solve arbitrary computational problems in an optimal fashion inspired by Kurt Gödel's celebrated selfreferential formulas (1931). Our Gödel machine's initial software includes ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
(Show Context)
An old dream of computer scientists is to build an optimally efficient universal problem solver. We show how to solve arbitrary computational problems in an optimal fashion inspired by Kurt Gödel's celebrated selfreferential formulas (1931). Our Gödel machine's initial software includes an axiomatic description of: the Gödel machine's hardware, the problemspecific utility function (such as the expected future reward of a robot), known aspects of the environment, costs of actions and computations, and the initial software itself (this is possible without introducing circularity). It also includes a typically suboptimal initial problemsolving policy and an asymptotically optimal proof searcher searching the space of computable proof techniques  that is, programs whose outputs are proofs. Unlike previous approaches, the selfreferential Gödel machine will rewrite any part of its software, including axioms and proof searcher, as soon as it has found a proof that this will improve its future performance, given its typically limited computational resources. We show that selfrewrites are globally optimal  no local minima!since provably none of all the alternative rewrites and proofs (those that could be found by continuing the proof search) are worth waiting for.
Artificial Curiosity Based on Discovering Novel Algorithmic Predictability Through Coevolution
, 1999
"... . How to explore a spatiotemporal domain? By predicting and learning from success /failure what's predictable and what's not. I study a "curious" embedded agent that differs from previous explorers in the sense that it can limit its predictions to fairly arbitrary, computable as ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
. How to explore a spatiotemporal domain? By predicting and learning from success /failure what's predictable and what's not. I study a "curious" embedded agent that differs from previous explorers in the sense that it can limit its predictions to fairly arbitrary, computable aspects of event sequences and thus can explicitly ignore almost arbitrary unpredictable, random aspects. It constructs initially random algorithms mapping event sequences to abstract internal representations (IRs). It also constructs algorithms predicting IRs from IRs computed earlier. It wants to learn novel algorithms creating IRs useful for correct IR predictions, without wasting time on those learned before. This is achieved by a coevolutionary scheme involving two competing modules collectively designing single algorithms to be executed. The modules can bet on the outcome of IR predictions computed by the algorithms they have agreed upon. If their opinions differ then the system checks who's right, punish...