Results 1  10
of
337
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
, 1998
"... In this paper, we adopt generalsum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zerosum stochastic games to a broader framework. We design a multiagent Qlearning method under this framework, and prove that it converges to a Na ..."
Abstract

Cited by 331 (4 self)
 Add to MetaCart
In this paper, we adopt generalsum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zerosum stochastic games to a broader framework. We design a multiagent Qlearning method under this framework, and prove that it converges to a Nash equilibrium under specified conditions. This algorithm is useful for finding the optimal strategy when there exists a unique Nash equilibrium in the game. When there exist multiple Nash equilibria in the game, this algorithm should be combined with other learning techniques to find optimal strategies.
Constrained Markov Decision Processes
, 1995
"... This report presents a unified approach for the study of constrained Markov decision processes with a countable state space and unbounded costs. We consider a single controller having several objectives; it is desirable to design a controller that minimize one of cost objective, subject to inequalit ..."
Abstract

Cited by 165 (11 self)
 Add to MetaCart
This report presents a unified approach for the study of constrained Markov decision processes with a countable state space and unbounded costs. We consider a single controller having several objectives; it is desirable to design a controller that minimize one of cost objective, subject to inequality constraints on other cost objectives. The objectives that we study are both the expected average cost, as well as the expected total cost (of which the discounted cost is a special case). We provide two frameworks: the case were costs are bounded below, as well as the contracting framework. We characterize the set of achievable expected occupation measures as well as performance vectors. This allows us to reduce the original control dynamic problem into an infinite Linear Programming. We present a Lagrangian approach that enables us to obtain sensitivity analysis. In particular, we obtain asymptotical results for the constrained control problem: convergence of both the value and the pol...
Dynamic Programming for Partially Observable Stochastic Games
 IN PROCEEDINGS OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2004
"... We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. ..."
Abstract

Cited by 159 (25 self)
 Add to MetaCart
(Show Context)
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games.
Nash QLearning for GeneralSum Stochastic Games
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We extend Qlearning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Qfunctions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Qvalues. This learning protocol provably conv ..."
Abstract

Cited by 138 (0 self)
 Add to MetaCart
(Show Context)
We extend Qlearning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Qfunctions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Qvalues. This learning protocol provably converges given certain restrictions on the stage games (defined by Qvalues) that arise during learning. Experiments with a pair of twoplayer grid games suggest that such restrictions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique equilibrium Qfunction, but sometimes fails to converge in the second, which has three different equilibrium Qfunctions. In a comparison of offline learning performance in both games, we find agents are more likely to reach a joint optimal path with Nash Qlearning than with a singleagent Qlearning method. When at least one agent adopts Nash Qlearning, the performance of both agents is better than using singleagent Qlearning. We have also implemented an online version of Nash Qlearning that balances exploration with exploitation, yielding improved performance.
Friend or foe QLearning in generalsum games
 In Proceedings of the 18th Int. Conf. on Machine Learning
, 2001
"... This paper describes an approach to reinforcement learning in multiagent generalsum games in which a learner is told to treat each other agent as either a \friend " or \foe". This Qlearningstyle algorithm provides strong convergence guarantees compared to an existing Nashequilibrium ..."
Abstract

Cited by 137 (6 self)
 Add to MetaCart
This paper describes an approach to reinforcement learning in multiagent generalsum games in which a learner is told to treat each other agent as either a \friend " or \foe". This Qlearningstyle algorithm provides strong convergence guarantees compared to an existing Nashequilibriumbased learning rule.
Action understanding as inverse planning
 Cognition
, 2009
"... Humans are adept at inferring the mental states underlying other agents’ actions, such as goals, beliefs, desires, emotions and other thoughts. We propose a computational framework based on Bayesian inverse planning for modeling human action understanding. The framework represents an intuitive theor ..."
Abstract

Cited by 107 (11 self)
 Add to MetaCart
Humans are adept at inferring the mental states underlying other agents’ actions, such as goals, beliefs, desires, emotions and other thoughts. We propose a computational framework based on Bayesian inverse planning for modeling human action understanding. The framework represents an intuitive theory of intentional agents ’ behavior based on the principle of rationality: the expectation that agents will plan approximately rationally to achieve their goals, given their beliefs about the world. The mental states that caused an agent’s behavior are inferred by inverting this model of rational planning using Bayesian inference, integrating the likelihood of the observed actions with the prior over mental states. This approach formalizes in precise probabilistic terms the essence of previous qualitative approaches to action understanding based on an “intentional stance ” (Dennett, 1987) or a “teleological stance ” (Gergely et al., 1995). In three psychophysical experiments using animated stimuli of agents moving in simple mazes, we assess how well different inverse planning models based on different goal priors can predict human goal inferences. The results provide quantitative evidence for an approximately rational inference mechanism in human goal inference within our simplified stimulus paradigm, and for the flexible nature of goal representations that human observers can adopt. We discuss the implications of our experimental results for human action understanding in realworld contexts, and suggest how our framework might be extended to capture other kinds of mental state inferences, such as inferences about beliefs, or inferring whether an entity is an intentional agent.
Learning About Other Agents in a Dynamic Multiagent System
, 2001
"... 21 We analyze the problem of learning about other agents in a class of dynamic multiagent systems, where performance of 22 the primary agent depends on behavior of the others. We consider an online version of the problem, where agents must learn 23 models of the others in the course of continual i ..."
Abstract

Cited by 81 (7 self)
 Add to MetaCart
21 We analyze the problem of learning about other agents in a class of dynamic multiagent systems, where performance of 22 the primary agent depends on behavior of the others. We consider an online version of the problem, where agents must learn 23 models of the others in the course of continual interactions. Various levels of recursive models are implemented in a 24 simulated double auction market. Our experiments show learning agents on average outperform nonlearning agents who do 25 not use information about others. Among learning agents, those with minimum recursion assumption generally perform 26 better than the agents with more complicated, though often wrong assumptions. 2001 Published by Elsevier Science B.V. 27 Keywords: Multiagent learning; Multiagent systems; Computational market 28 29 1.
Resource Interfaces
, 2003
"... We present a formalism for specifying component interfaces that expose component requirements on limited resources. The formalism permits an algorithmic check if two or more components, when put together, exceed the available resources. Moreover, the formalism can be used to compute the quantity ..."
Abstract

Cited by 79 (16 self)
 Add to MetaCart
(Show Context)
We present a formalism for specifying component interfaces that expose component requirements on limited resources. The formalism permits an algorithmic check if two or more components, when put together, exceed the available resources. Moreover, the formalism can be used to compute the quantity of resources necessary for satisfying the requirements of a collection of components. The formalism can be instantiated in several ways. For example, several components may draw power from the same source. Then, the formalism supports compatibility checks such as: can two components, when put together, achieve their tasks without ever exceeding the available amount of peak power? or, can they achieve their tasks by using no more than the initially available amount of energy (i.e., power accumulated over time)? The corresponding quantitative questions that our algorithms answer are the following: what is the amount of peak power needed for two components to be put together? what is the corresponding amount of initial energy? To solve these questions, we model interfaces with resource requirements as games with quantitative objectives. The games are played on state spaces where each state is labeled by a number (representing, e.g., power consumption) , and a play produces an infinite path of labels. The objective may be, for example, to minimize the largest label that occurs during a play. We
Quantitative Stochastic Parity Games
"... We study perfectinformation stochastic parity games. These are twoplayer nonterminating games which are played on a graph with turnbased probabilistic transitions. A play results in an infinite path and the conflicting goals of the two players are!regular path properties, formalized as parity w ..."
Abstract

Cited by 69 (32 self)
 Add to MetaCart
(Show Context)
We study perfectinformation stochastic parity games. These are twoplayer nonterminating games which are played on a graph with turnbased probabilistic transitions. A play results in an infinite path and the conflicting goals of the two players are!regular path properties, formalized as parity winning conditions. The qualitative solution of such a game amounts to computing the set of vertices from which a player has a strategy to win with probability 1 (or with positive probability). The quantitative solution amounts to computing the value of the game in every vertex, i.e., the highest probability with which a player can guarantee satisfaction of his own objective in a play that starts from the vertex. For the important special case of oneplayer stochastic parity games (parity Markov decision processes) we give polynomialtime algorithms both for the qualitative and the quantitative solution. The running time of the qualitative solution is O(d \Delta m 3=2) for graphs with m edges and d priorities. The quantitative solution is based on a linearprogramming formulation.