Results 1 - 10
of
113
Cooperative Multi-Agent Learning: The State of the Art
- Autonomous Agents and Multi-Agent Systems
, 2005
"... Cooperative multi-agent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract
-
Cited by 182 (8 self)
- Add to MetaCart
(Show Context)
Cooperative multi-agent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multi-agent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multi-agent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multi-agent learning problem domains, and a list of multi-agent learning resources. 1
Programmable reinforcement learning agents
, 2001
"... We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows f ..."
Abstract
-
Cited by 116 (1 self)
- Add to MetaCart
We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn’t specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the single-state case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decision-making analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decision-making tasks. We introduce different model-free reinforcementlearning techniques, unitedly called Sparse Cooperative Q-learning, which approximate the global action-value function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edge-based decomposition of the action-value function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcement-learning methods based on temporal differences.
An Analysis of Reinforcement Learning with Function Approximation
"... We address the problem of computing the optimal Q-function in Markov decision problems with infinite state-space. We analyze the convergence properties of several variations of Q-learning when combined with function approximation, extending the analysis of TD-learning in (Tsitsiklis & Van Roy, 1 ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
(Show Context)
We address the problem of computing the optimal Q-function in Markov decision problems with infinite state-space. We analyze the convergence properties of several variations of Q-learning when combined with function approximation, extending the analysis of TD-learning in (Tsitsiklis & Van Roy, 1996a) to stochastic control settings. We identify conditions under which such approximate methods converge with probability 1. We conclude with a brief discussion on the general applicability of our results and compare them with several related works. 1.
Concurrent hierarchical reinforcement learning
, 2005
"... We describe a language for partially specifying policies in domains consisting of multiple subagents working together to maximize a common reward function. The language extends ALisp with constructs for concurrency and dynamic assignment of subagents to tasks. During learning, the subagents learn a ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
We describe a language for partially specifying policies in domains consisting of multiple subagents working together to maximize a common reward function. The language extends ALisp with constructs for concurrency and dynamic assignment of subagents to tasks. During learning, the subagents learn a distributed representation of the Q-function for this partial policy. They then coordinate at runtime to find the best joint action at each step. We give examples showing that programs in this language are natural and concise. We also describe online and batch learning algorithms for learning a linear approximation to the Q-function, which make use of the coordination structure of the problem.
Online resource allocation using decompositional reinforcement learning
- In: Proc. of AAAI-05
, 2005
"... This paper considers a novel application domain for rein-forcement learning: that of “autonomic computing, ” i.e. self-managing computing systems. RL is applied to an online re-source allocation task in a distributed multi-application com-puting environment with independent time-varying load in each ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
This paper considers a novel application domain for rein-forcement learning: that of “autonomic computing, ” i.e. self-managing computing systems. RL is applied to an online re-source allocation task in a distributed multi-application com-puting environment with independent time-varying load in each application. The task is to allocate servers in real time so as to maximize the sum of performance-based expected utility in each application. This task may be treated as a com-posite MDP, and to exploit the problem structure, a simple lo-calized RL approach is proposed, with better scalability than previous approaches. The RL approach is tested in a realistic prototype data center comprising real servers, real HTTP re-quests, and realistic time-varying demand. This domain poses a number of major challenges associated with live training in a real system, including: the need for rapid training, explo-ration that avoids excessive penalties, and handling complex, potentially non-Markovian system effects. The early results are encouraging: in overnight training, RL performs as well as or slightly better than heavily researched model-based ap-proaches derived from queuing theory.
Autonomic Multi-Agent Management of Power and Performance in Data Centers
"... The rapidly rising cost and environmental impact of energy consumption in data centers has become a multi-billion dollar concern globally. In response, the IT Industry is actively engaged in a first-to-market race to develop energyconserving hardware and software solutions that do not sacrifice perf ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
(Show Context)
The rapidly rising cost and environmental impact of energy consumption in data centers has become a multi-billion dollar concern globally. In response, the IT Industry is actively engaged in a first-to-market race to develop energyconserving hardware and software solutions that do not sacrifice performance objectives. In this work we demonstrate a prototype of an integrated data center power management solution that employs server management tools, appropriate sensors and monitors, and an agent-based approach to achieve specified power and performance objectives. By intelligently turning off servers under low-load conditions, we can achieve over 25 % power savings over the unmanaged case without incurring SLA penalties for typical daily and weekly periodic demands seen in webserver farms. Categories and Subject Descriptors D.4.8 [Software]: Performance—measurements, modeling and prediction, operational analysis General Terms Data center, power measurement, multicriteria utility functions, policy-based management
Utile coordination: Learning interdependencies among cooperative agents
- In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG’05
, 2005
"... a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state repres ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state representation because only necessary coordination is modeled. We apply our method within the framework of coordination graphs in which value rules represent the coordination dependencies between the agents for a specific context. The algorithm is first applied on a small illustrative problem, and next on a large predator-prey problem in which two predators have to capture a single prey. 1
Sparse Cooperative Q-learning
- Proceedings of the International Conference on Machine Learning
, 2004
"... Learning in multiagent systems suffers from the fact that both the state and the action space scale exponentially with the number of agents. In this paper we are interested in using Q-learning to learn the coordinated actions of a group of cooperative agents, using a sparse representation of the joi ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
(Show Context)
Learning in multiagent systems suffers from the fact that both the state and the action space scale exponentially with the number of agents. In this paper we are interested in using Q-learning to learn the coordinated actions of a group of cooperative agents, using a sparse representation of the joint stateaction space of the agents. We first examine a compact representation in which the agents need to explicitly coordinate their actions only in a predefined set of states. Next, we use a coordination-graph approach in which we represent the Q-values by value rules that specify the coordination dependencies of the agents at particular states. We show how Q-learning can be efficiently applied to learn a coordinated policy for the agents in the above framework. We demonstrate the proposed method on the predator-prey domain, and we compare it with other related multiagent Q-learning methods.