### Probabilistic Inference Techniques for Scalable Multiagent Decision Making

, 2015

"... Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models—NEXP-Complete even for two agents—has limited their scalability. We present a promising new class of approxima-tion algorithms by developing novel connections betwe ..."

Abstract
- Add to MetaCart

Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models—NEXP-Complete even for two agents—has limited their scalability. We present a promising new class of approxima-tion algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques.

### Author manuscript, published in "AAMAS Worshop: Multi-agent Sequential Decision-Making in Uncertain Domains (2009) 4-10" Quadratic Programming for Multi-Target Tracking

, 2010

"... We consider the problem of tracking multiple, partially observed targets using multiple sensors arranged in a given configuration. We model the problem as a special case of a (finite horizon) DEC-POMDP. We present a quadratic program whose globally optimal solution yields an optimal tracking joint p ..."

Abstract
- Add to MetaCart

(Show Context)
We consider the problem of tracking multiple, partially observed targets using multiple sensors arranged in a given configuration. We model the problem as a special case of a (finite horizon) DEC-POMDP. We present a quadratic program whose globally optimal solution yields an optimal tracking joint policy, one that maximizes the expected targets detected over the given horizon. However, a globally optimal solution to the QP cannot always be found since the QP is nonconvex. To remedy this, we present two linearizations of the QP to equivalent 0-1 mixed integer linear programs (MIPs) whose optimal solutions, which may be always found through the branch and bound method, for example, yield optimal joint policies. Computational experience on different sensor configurations shows that finding an optimal joint policy by solving the proposed MIPs is much faster than using existing algorithms for the problem. 1

### IOS Press Introducing Communication in Dis-POMDPs with Locality of Interaction

"... Abstract. The Networked Distributed POMDPs (ND-POMDPs) can model multiagent systems in uncertain domains and has begun to scale-up the number of agents. However, prior work in ND-POMDPs has failed to address communication. Without communication, the size of a local policy at each agent within the ND ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract. The Networked Distributed POMDPs (ND-POMDPs) can model multiagent systems in uncertain domains and has begun to scale-up the number of agents. However, prior work in ND-POMDPs has failed to address communication. Without communication, the size of a local policy at each agent within the ND-POMDPs grows exponentially in the time horizon. To overcome this problem, we extend existing algorithms so that agents periodically communicate their observation and action histories with each other. After communication, agents can start from new synchronized belief state. Thus, we can avoid the exponential growth in the size of local policies at agents. Furthermore, we introduce an idea that is similar to the Point-based Value Iteration algorithm to approximate the value function with a fixed number of representative points. Our experimental results show that we can obtain much longer policies than existing algorithms as long as the interval between communications is small.

### Multi-agent role allocation: issues, approaches, and multiple perspectives

- AUTON AGENT MULTI-AGENT SYST (2011) 22:317-355
, 2011

"... In cooperative multi-agent systems, roles are used as a design concept when creating large systems, they are known to facilitate specialization of agents, and they can help to reduce interference in multi-robot domains. The types of tasks that the agents are asked to solve and the communicative capa ..."

Abstract
- Add to MetaCart

(Show Context)
In cooperative multi-agent systems, roles are used as a design concept when creating large systems, they are known to facilitate specialization of agents, and they can help to reduce interference in multi-robot domains. The types of tasks that the agents are asked to solve and the communicative capabilities of the agents significantly affect the way roles are used in cooperative multi-agent systems. Along with a discussion of these issues about roles in multi-agent systems, this article compares computational models of the role allocation problem, presents the notion of explicitly versus implicitly defined roles, gives a survey of the methods used to approach role allocation problems, and concludes with a list of open research questions related to roles in multi-agent systems.

### maastrichtuniversity.nl

"... Dec-POMDPsareapowerfulframework for planninginmultiagent systems, but are provably intractable to solve. Despite recent work on scaling to more agents by exploiting weakcouplingsinfactoredmodels, scalabilityforunrestricted subclasses remains limited. This paper proposes a factored forward-sweep poli ..."

Abstract
- Add to MetaCart

(Show Context)
Dec-POMDPsareapowerfulframework for planninginmultiagent systems, but are provably intractable to solve. Despite recent work on scaling to more agents by exploiting weakcouplingsinfactoredmodels, scalabilityforunrestricted subclasses remains limited. This paper proposes a factored forward-sweep policy computation method that tackles the stages of the problem one by one, exploiting weakly coupled structure at each of these stages. To enable the method to scale to many agents, we propose a set of approximations: approximation of stages using a sparse interaction structure, bootstrapping off smaller tasks to compute heuristic payoff functions, and employing approximate inference to estimate required probabilities at each stage and to compute the best decision rules. An empirical evaluation shows that the loss in solution quality due to these approximations is small and that the proposed method achieves unprecedented scalability, solving Dec-POMDPs with hundreds of agents.

### Increasing Scalability in Algorithms for Centralized and Decentralized Partially Observable Markov Decision Processes (Extended Abstract)

, 2009

"... Real-world problems contain many forms of uncertainty, but current algorithms for solving sequential decision making problems under uncertainty are limited to small problems due to large resource usage. In my thesis, I study methods to increase the scalability of these approaches such as using memor ..."

Abstract
- Add to MetaCart

Real-world problems contain many forms of uncertainty, but current algorithms for solving sequential decision making problems under uncertainty are limited to small problems due to large resource usage. In my thesis, I study methods to increase the scalability of these approaches such as using memory bounded solutions, sampling or taking advantage of domain structure. I also plan to explore other methods to improve scalability and generate more practical real-world domains on which to test these algorithms.

### Yuki Iwanari

"... Abstract. The Networked Distributed POMDPs (ND-POMDPs) can model multiagent systems in uncertain domains and has begun to scale-up the number of agents. However, prior work in ND-POMDPs has failed to address communication. Without communication, the size of a local policy at each agent within the ND ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract. The Networked Distributed POMDPs (ND-POMDPs) can model multiagent systems in uncertain domains and has begun to scale-up the number of agents. However, prior work in ND-POMDPs has failed to address communication. Without communication, the size of a local policy at each agent within the ND-POMDPs grows exponentially in the time horizon. To

### UNIVERSITY OF SOUTHAMPTON

"... Bayesian learning for multi-agent coordination by ..."

(Show Context)
### Iterative Online Planning in Multiagent Settings with Limited Model Spaces and PAC Guarantees

"... Methods for planning in multiagent settings often model other agents ’ possible behaviors. However, the space of these models – whether these are policy trees, finite-state controllers or inten-tional models – is very large and thus arbitrarily bounded. This may exclude the true model or the optimal ..."

Abstract
- Add to MetaCart

(Show Context)
Methods for planning in multiagent settings often model other agents ’ possible behaviors. However, the space of these models – whether these are policy trees, finite-state controllers or inten-tional models – is very large and thus arbitrarily bounded. This may exclude the true model or the optimal model. In this paper, we present a novel iterative algorithm for online planning that consid-ers a limited model space, updates it dynamically using data from interactions, and provides a provable and probabilistic bound on the approximation error. We ground this approach in the context of graphical models for planning in partially observable multiagent settings – interactive dynamic influence diagrams. We empirically demonstrate that the limited model space facilitates fast solutions and that the true model often enters the limited model space.