See this document in CiteSeerX!

Journal of Machine Learning Research 7 (2006) 1789-1828 Submitted 12/05; Published 9/06 Collaborative Multiagent Reinforcement Learning  (Make Corrections)  
by Payoff Propagation Jelle R. Kok Nikos Vlassis...



  Home/Search   Context   Related

 
View or download:
jmlr.org/papers/volume7/ko...kok06a.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  mit.edu/papers/v7/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the single-state case and describe a payoff propagation algorithm that computes the individual actions that approximately... (Update)

Active bibliography (related documents):   More   All
0.6:   Conflicts in teamwork: Hybrids to the rescue - Tambe, Bowring, Jung, Kaminka, .. (2005)   (Correct)
0.6:   Anytime Algorithms for Multiagent Decision Making Using.. - Vlassis, Elhorst, Kok (2004)   (Correct)
0.6:   Solution Sets in DCOPs and Graphical Games: Metrics and.. - Pearce, Maheswaran, Tambe   (Correct)

Similar documents based on text:
0.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

@misc{ jelle-journal,
  author = "Payoff Propagation Jelle",
  title = "Journal of Machine Learning Research 7 (2006) 1789--1828 Submitted 12/05;
    Published 9/06 Collaborative Multiagent Reinforcement Learning",
  url = "citeseer.ist.psu.edu/758246.html" }
Citations (may not include all citations):
760   Probabilistic reasoning in intelligent systems (context) - Pearl - 1988
614   Reinforcement learning: An introduction - Sutton, Barto - 1998
413   Neuro-dynamic programming (context) - Bertsekas, Tsitsiklis - 1996
291   Markov decision processes: Discrete stochastic dynamic progr.. (context) - Puterman - 1994
153   RoboCup: The robot world cup initiative - Kitano, Asada et al. - 1995
149   Technical note: Q-learning (context) - Watkins, Dayan - 1992
142   Factor graphs and the sum-product algorithm - Kschischang, Frey et al. - 2001
135   Multi-agent reinforcement learning: Independent vs - Tan - 1993
124   Improving elevator performance using reinforcement learning - Crites, Barto - 1996
110   Temporal difference learning and TD-Gammon (context) - Tesauro - 1995
104   Multiagent systems: A modern approach to distributed artific.. (context) - Weiss - 1999
95   Collaborative multi-robot exploration - Burgard, Moors et al. - 2000
83   Learning to coordinate without sharing information - Sen, Sekaran et al. - 1994
79   Stochastic games (context) - Shapley - 1953
77   Nonserial dynamic programming (context) - Bertel, Brioschi - 1972
74   Loopy belief propagation for approximate inference: An empir.. - Murphy, Weiss et al. - 1999
56   Exploiting causal independence in Bayesian network inference - Zhang, Poole - 1996
52   Packet routing in dynamically changing networks: A reinforce.. - Boyan, Littman - 1994
48   Learning to cooperate via policy search - Peshkin, Kim et al. - 2000
47   The communicative multiagent team decision problem: Analyzin.. - Pynadath, Tambe - 2002
46   Understanding belief propagation and its generalizations (context) - Yedidia, Freeman et al. - 2003
45   A scheme for approximating probabilistic inference - Dechter, Rish - 1997
45   The complexity of decentralized control of Markov decision p.. - Bernstein, Zilberstein et al. - 2000
44   Multiagent systems - Sycara - 1998
33   learning and coordination in multiagent decision processes (context) - Boutilier - 1996
28   Constraint Processing (context) - Dechter - 2003
18   Distributed value functions - Schneider, Wong et al. - 1999
15   Transition-independent decentralized Markov decision process.. - Becker, Zilberstein et al. - 2003
14   Scaling up agent coordination strategies (context) - Durfee - 2001
13   Distributed algorithms for multi-robot observation of multip.. - Parker - 2002
11   Loopy belief propagation as a basis for communication in sen.. (context) - Crick, Pfeffer - 2003
8   ADOPT: Asynchronous distributed constraint optimization with.. - Modi, Shen et al. - 2005
7   A concise introduction to multiagent systems and distributed.. (context) - Vlassis - 2003
6   Tree consistency and bounds on the performance of the max-pr.. - Wainwright, Jaakkola et al. - 2004
5   Distributed sensor nets: A multiagent perspective (context) - Lesser, Ortiz et al. - 2003
4   Autonomous helicopter flight via reinforcement learning (context) - Ng, Kim et al. - 2004
3   Sparse cooperative Q-learning - Kok, Vlassis - 2005
3   Preprocessing techniques for accelerating the DCOP algorithm.. (context) - Ali, Koenig et al. - 2005
2   IEEE Signal Processing Magazine (context) - Loeliger, to - 2004
2   Anytime algorithms for multiagent decision making using coor.. - Vlassis, Elhorst et al. - 2004
2   Distributed constraint optimization as a formal model of par.. - Yokoo, Durfee - 1991
2   Decentralized control of cooperative systems: Categorization.. (context) - Goldman, Zilberstein - 2004
1   Distributed optimization in adaptive networks - Moallemi, Van Roy - 2004
1   Non-communicative multi-robot coordination in dynamic enviro.. (context) - Kok, Spaan et al. - 2005
1   Editorial: Advances in multi-robot systems - Arai, Pagello et al. - 2002
1   Coordination and Learning in Cooperative Multiagent Systems (context) - Kok - 2006
1   Reinforcement learning for RoboCup-soccer keepaway (context) - Stone, Sutton et al. - 2005
1   Cooperative information sharing to improve distributed learn.. - Dutta, Jennings et al. - 2005

Documents on the same site (http://jmlr.csail.mit.edu/papers/v7/):   More
Estimating the "Wrong" Graphical Model: Benefits in the.. - Wainwright (2006)   (Correct)
A Hierarchy of Support Vector Machines for Pattern Detection - Sahbi, Geman (2006)   (Correct)
Efficient Learning of Label Ranking by Soft Projections.. - Shalev-Shwartz, Singer (2006)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC