Results 1  10
of
331
RMAX  A General Polynomial Time Algorithm for NearOptimal Reinforcement Learning
, 2001
"... Rmax is a very simple modelbased reinforcement learning algorithm which can attain nearoptimal average reward in polynomial time. In Rmax, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The mod ..."
Abstract

Cited by 297 (10 self)
 Add to MetaCart
Rmax is a very simple modelbased reinforcement learning algorithm which can attain nearoptimal average reward in polynomial time. In Rmax, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, it is updated based on the agent's observations. Rmax improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh's E algorithm, covering zerosum stochastic games. (2) It has a builtin mechanism for resolving the exploration vs...
Multiagent Learning Using a Variable Learning Rate
 Artificial Intelligence
, 2002
"... Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms hav ..."
Abstract

Cited by 225 (8 self)
 Add to MetaCart
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, "Win or Learn Fast", for varying the learning rate. We examine this technique theoretically, proving convergence in selfplay on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of selfplay and otherwise, demonstrating the wide applicability of this method.
Cooperative MultiAgent Learning: The State of the Art
 Autonomous Agents and MultiAgent Systems
, 2005
"... Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract

Cited by 182 (8 self)
 Add to MetaCart
(Show Context)
Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multiagent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multiagent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multiagent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multiagent learning problem domains, and a list of multiagent learning resources. 1
Learning to Cooperate via Policy Search
, 2000
"... Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcementlearning algorithms, such as variants of Qlearning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Poli ..."
Abstract

Cited by 141 (4 self)
 Add to MetaCart
(Show Context)
Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcementlearning algorithms, such as variants of Qlearning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to valuebased methods for partially observable environments. In this paper, we provide a gradientbased distributed policysearch method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain. 1 INTRODUCTION The interaction of decision makers who share an environment is traditionally studied in game theory and economics. The game theoretic formalism is very general, and analyzes the problem in terms of solution concepts such as Nash equilibrium [12], but usually works under the assu...
An introduction to collective intelligence
 Handbook of Agent technology. AAAI
, 1999
"... ..."
(Show Context)
Nash QLearning for GeneralSum Stochastic Games
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We extend Qlearning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Qfunctions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Qvalues. This learning protocol provably conv ..."
Abstract

Cited by 138 (0 self)
 Add to MetaCart
(Show Context)
We extend Qlearning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Qfunctions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Qvalues. This learning protocol provably converges given certain restrictions on the stage games (defined by Qvalues) that arise during learning. Experiments with a pair of twoplayer grid games suggest that such restrictions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique equilibrium Qfunction, but sometimes fails to converge in the second, which has three different equilibrium Qfunctions. In a comparison of offline learning performance in both games, we find agents are more likely to reach a joint optimal path with Nash Qlearning than with a singleagent Qlearning method. When at least one agent adopts Nash Qlearning, the performance of both agents is better than using singleagent Qlearning. We have also implemented an online version of Nash Qlearning that balances exploration with exploitation, yielding improved performance.
Friend or foe QLearning in generalsum games
 In Proceedings of the 18th Int. Conf. on Machine Learning
, 2001
"... This paper describes an approach to reinforcement learning in multiagent generalsum games in which a learner is told to treat each other agent as either a \friend " or \foe". This Qlearningstyle algorithm provides strong convergence guarantees compared to an existing Nashequilibrium ..."
Abstract

Cited by 137 (6 self)
 Add to MetaCart
This paper describes an approach to reinforcement learning in multiagent generalsum games in which a learner is told to treat each other agent as either a \friend " or \foe". This Qlearningstyle algorithm provides strong convergence guarantees compared to an existing Nashequilibriumbased learning rule.
Optimizing Information Exchange in Cooperative Multiagent Systems
, 2003
"... Decentralized control of a cooperative multiagent system is the problem faced by multiple decisionmakers that share a common set of objectives. The decisionmakers may be robots placed at separate geographical locations or computational processes distributed in an information space. It may be impo ..."
Abstract

Cited by 107 (18 self)
 Add to MetaCart
Decentralized control of a cooperative multiagent system is the problem faced by multiple decisionmakers that share a common set of objectives. The decisionmakers may be robots placed at separate geographical locations or computational processes distributed in an information space. It may be impossible or undesirable for these decisionmakers to share all their knowledge all the time. Furthermore, exchanging information may incur a cost associated with the required bandwidth or with the risk of revealing it to competing agents. Assuming that communication may not be reliable adds another dimension of complexity to the problem. This paper develops a decisiontheoretic solution to this problem, treating both standard actions and communication as explicit choices that the decision maker must consider. The goal is to derive both action policies and communication policies that together optimize a global value function. We present an analytical model to evaluate the tradeo# between the cost of communication and the value of the information received. Finally, to address the complexity of this hard optimization problem, we develop a practical approximation technique based on myopic metalevel control of communication.
An Algorithm for Distributed Reinforcement Learning in Cooperative MultiAgent Systems
 In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... The article focuses on distributed reinforcement learning in cooperative multiagent decisionprocesses, where an ensemble of simultaneously and independently acting agents tries to maximize a discounted sum of rewards. We assume that each agent has no information about its teammates' beh ..."
Abstract

Cited by 99 (11 self)
 Add to MetaCart
The article focuses on distributed reinforcement learning in cooperative multiagent decisionprocesses, where an ensemble of simultaneously and independently acting agents tries to maximize a discounted sum of rewards. We assume that each agent has no information about its teammates' behaviour. Thus, in contrast to singleagent reinforcementlearning each agent has to consider its teammates' behaviour and to nd a cooperative policy. We propose a modelfree distributed Qlearning algorithm for cooperative multiagentdecisionprocesses. It can be proved to nd optimal policies in deterministic environments. No additional expense is needed in comparison to the nondistributed case. Further there is no need for additional communication between the agents. 1. Introduction Reinforcement learning has originally been discussed for Markov Decision Processes (MDPs): a single agent has to learn a policy that maximizes the discounted sum of rewards in a stochastic environment...