Results 1 - 10
of
331
R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
, 2001
"... R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The mod ..."
Abstract
-
Cited by 297 (10 self)
- Add to MetaCart
R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, it is updated based on the agent's observations. R-max improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh's E algorithm, covering zero-sum stochastic games. (2) It has a built-in mechanism for resolving the exploration vs...
Multiagent Learning Using a Variable Learning Rate
- Artificial Intelligence
, 2002
"... Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms hav ..."
Abstract
-
Cited by 225 (8 self)
- Add to MetaCart
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, "Win or Learn Fast", for varying the learning rate. We examine this technique theoretically, proving convergence in self-play on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of self-play and otherwise, demonstrating the wide applicability of this method.
Cooperative Multi-Agent Learning: The State of the Art
- Autonomous Agents and Multi-Agent Systems
, 2005
"... Cooperative multi-agent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract
-
Cited by 182 (8 self)
- Add to MetaCart
(Show Context)
Cooperative multi-agent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multi-agent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multi-agent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multi-agent learning problem domains, and a list of multi-agent learning resources. 1
Learning to Cooperate via Policy Search
, 2000
"... Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Poli ..."
Abstract
-
Cited by 141 (4 self)
- Add to MetaCart
(Show Context)
Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policysearch method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain. 1 INTRODUCTION The interaction of decision makers who share an environment is traditionally studied in game theory and economics. The game theoretic formalism is very general, and analyzes the problem in terms of solution concepts such as Nash equilibrium [12], but usually works under the assu...
An introduction to collective intelligence
- Handbook of Agent technology. AAAI
, 1999
"... ..."
(Show Context)
Nash Q-Learning for General-Sum Stochastic Games
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We extend Q-learning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Q-functions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Q-values. This learning protocol provably conv ..."
Abstract
-
Cited by 138 (0 self)
- Add to MetaCart
(Show Context)
We extend Q-learning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Q-functions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Q-values. This learning protocol provably converges given certain restrictions on the stage games (defined by Q-values) that arise during learning. Experiments with a pair of two-player grid games suggest that such restrictions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique equilibrium Q-function, but sometimes fails to converge in the second, which has three different equilibrium Q-functions. In a comparison of offline learning performance in both games, we find agents are more likely to reach a joint optimal path with Nash Q-learning than with a single-agent Q-learning method. When at least one agent adopts Nash Q-learning, the performance of both agents is better than using single-agent Q-learning. We have also implemented an online version of Nash Q-learning that balances exploration with exploitation, yielding improved performance.
Friend or foe Q-Learning in general-sum games
- In Proceedings of the 18th Int. Conf. on Machine Learning
, 2001
"... This paper describes an approach to rein-forcement learning in multiagent general-sum games in which a learner is told to treat each other agent as either a \friend " or \foe". This Q-learning-style algorithm provides strong convergence guarantees compared to an ex-isting Nash-equilibrium- ..."
Abstract
-
Cited by 137 (6 self)
- Add to MetaCart
This paper describes an approach to rein-forcement learning in multiagent general-sum games in which a learner is told to treat each other agent as either a \friend " or \foe". This Q-learning-style algorithm provides strong convergence guarantees compared to an ex-isting Nash-equilibrium-based learning rule.
Optimizing Information Exchange in Cooperative Multi-agent Systems
, 2003
"... Decentralized control of a cooperative multi-agent system is the problem faced by multiple decision-makers that share a common set of objectives. The decision-makers may be robots placed at separate geographical locations or computational processes distributed in an information space. It may be impo ..."
Abstract
-
Cited by 107 (18 self)
- Add to MetaCart
Decentralized control of a cooperative multi-agent system is the problem faced by multiple decision-makers that share a common set of objectives. The decision-makers may be robots placed at separate geographical locations or computational processes distributed in an information space. It may be impossible or undesirable for these decision-makers to share all their knowledge all the time. Furthermore, exchanging information may incur a cost associated with the required bandwidth or with the risk of revealing it to competing agents. Assuming that communication may not be reliable adds another dimension of complexity to the problem. This paper develops a decision-theoretic solution to this problem, treating both standard actions and communication as explicit choices that the decision maker must consider. The goal is to derive both action policies and communication policies that together optimize a global value function. We present an analytical model to evaluate the trade-o# between the cost of communication and the value of the information received. Finally, to address the complexity of this hard optimization problem, we develop a practical approximation technique based on myopic meta-level control of communication.
An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems
- In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... The article focuses on distributed reinforcement learning in cooperative multiagent -decision-processes, where an ensemble of simultaneously and independently acting agents tries to maximize a discounted sum of rewards. We assume that each agent has no information about its teammates' beh ..."
Abstract
-
Cited by 99 (11 self)
- Add to MetaCart
The article focuses on distributed reinforcement learning in cooperative multiagent -decision-processes, where an ensemble of simultaneously and independently acting agents tries to maximize a discounted sum of rewards. We assume that each agent has no information about its teammates' behaviour. Thus, in contrast to single-agent reinforcement-learning each agent has to consider its teammates' behaviour and to nd a cooperative policy. We propose a model-free distributed Q-learning algorithm for cooperative multi-agent-decision-processes. It can be proved to nd optimal policies in deterministic environments. No additional expense is needed in comparison to the non-distributed case. Further there is no need for additional communication between the agents. 1. Introduction Reinforcement learning has originally been discussed for Markov Decision Processes (MDPs): a single agent has to learn a policy that maximizes the discounted sum of rewards in a stochastic environment...