| Wang, X., & Sandholm, T. (2002). Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada. |
....[4] For obvious reasons, one would like RL methods that converge to desirable (e.g. optimal) equilibria. A number of heuristic exploration strategies have been proposed that in fact increase the probability (or even guarantee) that optimal equilibria are reached in identical interest games [4, 13, 12, 19]. Unfortunately, methods that encourage or force convergence to optimal equilibria often do so at a great cost. Coordination on a good strategy profile often requires exploration in parts of policy space that are very unrewarding. In such a case, the benefits of eventual coordination to an ....
....estimates. Kapetanakis and Kudenko [12] propose a method called FMQ for repeated games that uses the optimistic assumption to bias exploration, much like [4] but in the context of individual learners (that do not have explicit access to the actions performed by other agents) Wang and Sandholm [19] similarly use the optimistic assumption in repeated games to guarantee convergence to an optimal equilibrium. We critique these methods below. 3. A BAYESIAN VIEW OF MARL The spate of activity described above on MARL in identical interest games has focused exclusively on devising methods that ....
X. Wang and T. Sandholm. Reinforcement learning to play an optimal nash equilibrium in team markov games. In Advances in Neural Information Processing Systems 15 (NIPS-2002.
No context found.
Wang, X., & Sandholm, T. (2002). Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada.
No context found.
Wang, X., & Sandholm, T. (2002). Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada.
No context found.
Wang and Sandholm. Reinforcement learning to play an optimal Nash equilibrium in team markov game. In NIPS, 02.
No context found.
Wang and Sandholm. Reinforcement learning to play an optimal Nash equilibrium in team markov game. In NIPS, 02.
No context found.
Xiaofeng Wang and Tuomas Sandholm. Reinforcement learning to play an optimal nash equilibrium in team markov games. In Advances in Neural Information Processing Systems 15 (NIPS'02), 2002.
No context found.
X. Wang and T. Sandholm. Reinforcement learning to play an optimal nash equilibrium in team markov games. In Proceedings of the 16th Neural Information Processing Systems: Natural and Synthetic (NIPS) conference, Vancouver, 2002.
No context found.
X. Wang and T. Sandholm. Reinforcement learning to play an optimal nash equilibrium in team markov games. In Proceedings of the 16th Neural Information Processing Systems: Natural and Synthetic conference, Vancouver, Canada, 2002.
No context found.
X. Wang and T. Sandholm. Reinforcement Learning to Play An Optimal Nash Equilibrium in Team Markov Games. In Advances in Neural Information Processing Systems 15 (NIPS 2002.
No context found.
X. Wang and T. Sandholm. Reinforcement learning to play an optimal nash equilibrium in team markov games. In Advances in Neural Information Processing Systems 15 (NIPS-2002.
No context found.
Xiaofeng Wang and Tuomas Sandholm. Reinforcement learning to play an optimal nash equilibrium in team markov games. In Advances in Neural Information Processing Systems 15, NIPS, 2002.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC