Results 1  10
of
20
Incremental Clustering and Expansion for Faster Optimal Planning in Decentralized POMDPs
, 2013
"... This article presents the stateoftheart in optimal solution methods for decentralized partially observable Markov decision processes (DecPOMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which ..."
Abstract

Cited by 19 (12 self)
 Add to MetaCart
(Show Context)
This article presents the stateoftheart in optimal solution methods for decentralized partially observable Markov decision processes (DecPOMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which reduces the problem to a tree of oneshot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of DecPOMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA * search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node’s depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger DecPOMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*ICE, an algorithm that synthesizes these advances, can optimally solve DecPOMDPs of unprecedented size.
Decentralized Control of Partially Observable Markov Decision Processes
"... Abstract — Markov decision processes (MDPs) are often used to model sequential decision problems involving uncertainty under the assumption of centralized control. However, many large, distributed systems do not permit centralized control due to communication limitations (such as cost, latency or co ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
(Show Context)
Abstract — Markov decision processes (MDPs) are often used to model sequential decision problems involving uncertainty under the assumption of centralized control. However, many large, distributed systems do not permit centralized control due to communication limitations (such as cost, latency or corruption). This paper surveys recent work on decentralized control of MDPs in which control of each agent depends on a partial view of the world. We focus on a general framework where there may be uncertainty about the state of the environment, represented as a decentralized partially observable MDP (DecPOMDP), but consider a number of subclasses with different assumptions about uncertainty and agent independence. In these models, a shared objective function is used, but plans of action must be based on a partial view of the environment. We describe the frameworks, along with the complexity of optimal control and important properties. We also provide an overview of exact and approximate solution methods as well as relevant applications. This survey provides an introduction to what has become an active area of research on these models and their solutions. I.
Influencebased abstraction for multiagent systems
 In AAAI
, 2012
"... This paper presents a theoretical advance by which factored POSGs can be decomposed into local models. We formalize the interface between such local models as the influence agents can exert on one another; and we prove that this interface is sufficient for decoupling them. The resulting influence ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
This paper presents a theoretical advance by which factored POSGs can be decomposed into local models. We formalize the interface between such local models as the influence agents can exert on one another; and we prove that this interface is sufficient for decoupling them. The resulting influencebased abstraction substantially generalizes previous work on exploiting weaklycoupled agent interaction structures. Therein lie several important contributions. First, our general formulation sheds new light on the theoretical relationships among previous approaches, and promotes future empirical comparisons that could come by extending them beyond the more specific problem contexts for which they were developed. More importantly, the influencebased approaches that we generalize have shown promising improvements in the scalability of planning for more restrictive models. Thus, our theoretical result here serves as the foundation for practical algorithms that we anticipate will bring similar improvements to more general planning contexts, and also into other domains such as approximate planning, decisionmaking in adversarial domains, and online learning. 1
Largescale multirobot task allocation via dynamic partitioning and distribution. Autonomous Robots 33(3):291–307
, 2012
"... This paper introduces an approach that scales assignment algorithms to large numbers of robots and tasks. It is especially suitable for dynamic task allocations since both task locality and sparsity can be effectively exploited. We observe that an assignment can be computed through coarsening and ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
This paper introduces an approach that scales assignment algorithms to large numbers of robots and tasks. It is especially suitable for dynamic task allocations since both task locality and sparsity can be effectively exploited. We observe that an assignment can be computed through coarsening and partitioning operations on the standard utility matrix via a set of mature partitioning techniques and programs. The algorithm mixes centralized and decentralized approaches dynamically at different scales to produce a fast, robust method that is accurate and scalable, and reduces both the global communication and unnecessary repeated computation. An allocation results by operating on each partition: either the steps are repeated recursively to refine the generalized assignment, or each subproblem may be solved by an existing algorithm. The results suggest that only a minor sacrifice in solution quality is needed for significant gains in efficiency. The algorithm is validated using extensive simulation experiments and the results show advantages over the traditional optimal assignment algorithms.
Planning with macroactions in decentralized POMDPs
 In Proceedings of the Thirteenth International Conference on Autonomous Agents and Multiagent Systems
, 2014
"... Decentralized partially observable Markov decision processes (DecPOMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent’s actions are primitive operations lasting exactly one time step. ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DecPOMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent’s actions are primitive operations lasting exactly one time step. We address the case where each agent has macroactions: temporally extended actions which may require different amounts of time to execute. We model macroactions as options in a factored DecPOMDP model, focusing on options which depend only on information available to an individual agent while executing. This enables us to model systems where coordination decisions only occur at the level of deciding which macroactions to execute, and the macroactions themselves can then be executed to completion. The core technical difficulty when using options in a DecPOMDP is that the options chosen by the agents no longer terminate at the same time. We present extensions of two leading DecPOMDP algorithms for generating a policy with options and discuss the resulting form of optimality. Our results show that these algorithms retain agent coordination while allowing nearoptimal solutions to be generated for significantly longer horizons and larger statespaces than previous DecPOMDP methods. 1.
Unleashing DecMDPs in Security Games: Enabling Effective Defender Teamwork
"... Multiagent teamwork and defenderattacker security games are two areas that are currently receiving significant attention within multiagent systems research. Unfortunately, despite the need for effective teamwork among multiple defenders, little has been done to harness the teamwork research in sec ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Multiagent teamwork and defenderattacker security games are two areas that are currently receiving significant attention within multiagent systems research. Unfortunately, despite the need for effective teamwork among multiple defenders, little has been done to harness the teamwork research in security games. This paper is the first to remedy this situation by integrating the powerful teamwork mechanisms offered by DecMDPs into security games. We offer the following novel contributions in this paper: (i) New models of security games where a defender team’s pure strategy is defined as a DecMDP policy for addressing coordination under uncertainty; (ii) New algorithms based on column generation that enable efficient generation of mixed strategies given this new model; (iii) Handling global events during defender execution for effective teamwork; (iv) Exploration of the robustness of randomized pure strategies. The paper opens the door to a potentially new area combining computational game theory and multiagent teamwork. 1
Planning for Decentralized Control of Multiple Robots Under Uncertainty
"... We describe a probabilistic framework for synthesizing control policies for general multirobot systems, given environment and sensor models and a cost function. Decentralized, partially observable Markov decision processes (DecPOMDPs) are a general model of decision processes where a team of ag ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We describe a probabilistic framework for synthesizing control policies for general multirobot systems, given environment and sensor models and a cost function. Decentralized, partially observable Markov decision processes (DecPOMDPs) are a general model of decision processes where a team of agents must cooperate to optimize some objective (specified by a shared reward or cost function) in the presence of uncertainty, but where communication limitations mean that the agents cannot share their state, so execution must proceed in a decentralized fashion. While DecPOMDPs are typically intractable to solve for realworld problems, recent research on the use of macroactions in DecPOMDPs has significantly increased the size of problem that can be practically solved as a DecPOMDP. We describe this general model, and show how, in contrast to most existing methods that are specialized to a particular problem class, it can synthesize control policies that use whatever opportunities for coordination are present in the problem, while balancing off uncertainty in outcomes, sensor information, and information about other agents. We use three variations on a warehouse task to show that a single planner of this type can generate cooperative behavior using task allocation, direct communication, and signaling, as appropriate.
Decentralized stochastic planning with anonymity in interactions
 In Proc. of the AAAI Conference on Artificial Intelligence
, 2014
"... In this paper, we solve cooperative decentralized stochastic planning problems, where the interactions between agents (specified using transition and reward functions) are dependent on the number of agents (and not on the identity of the individual agents) involved in the interaction. A collision o ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we solve cooperative decentralized stochastic planning problems, where the interactions between agents (specified using transition and reward functions) are dependent on the number of agents (and not on the identity of the individual agents) involved in the interaction. A collision of robots in a narrow corridor, defender teams coordinating patrol activities to secure a target, etc. are examples of such anonymous interactions. Formally, we consider problems that are a subset of the well known Decentralized MDP (DECMDP) model, where the anonymity in interactions is specified within the joint reward and transition functions. In this paper, not only do we introduce a general model model called DSPAIT to capture anonymity in interactions, but also provide optimization based optimal and localoptimal solutions for generalizable subcategories of DSPAIT.
QDlearning: A collaborative distributed strategy for multiagent reinforcement learning through consensus + innovations
 IEEE Transactions on Signal Processing
, 2013
"... The paper considers a class of multiagent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous onestage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed r ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The paper considers a class of multiagent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous onestage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed reinforcement learning setup with no prior information on the global state transition and local agent cost statistics. Specifically, with the agents ’ objective consisting of minimizing a networkaveraged infinite horizon discounted cost, the paper proposes a distributed version of Qlearning, QDlearning, in which the network agents collaborate by means of local processing and mutual information exchange over a sparse (possibly stochastic) communication network to achieve the network goal. Under the assumption that each agent is only aware of its local online cost data and the interagent communication network is weakly connected, the proposed distributed scheme is almost surely (a.s.) shown to yield asymptotically the desired value function and the optimal stationary control policy at each network agent. The analytical techniques developed in the paper to address the mixed timescale stochastic dynamics of the consensus + innovations form, which arise as a result of the proposed interactive distributed scheme, are of independent interest.
Modeling Information Exchange Opportunities For Effective Humancomputer Teamwork
"... This paper studies information exchange in collaborative group activities involving mixed networks of people and computer agents. It introduces the concept of “nearly decomposable” decisionmaking problems to address the complexity of information exchange decisions in such multiagent settings. This ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper studies information exchange in collaborative group activities involving mixed networks of people and computer agents. It introduces the concept of “nearly decomposable” decisionmaking problems to address the complexity of information exchange decisions in such multiagent settings. This class of decisionmaking problems arise in settings which have an action structure that requires agents to reason about only a subset of their partners ’ actions but otherwise allows them to act independently. The paper presents a formal model of nearly decomposable decision making problems, NEDMDPs, and defines an approximation algorithm, NEDDECOP that computes efficient information exchange strategies. The paper shows that NEDDECOP is more efficient than prior collaborative planning algorithms for this class of problem. It presents an empirical study of the information exchange decisions made by the algorithm that investigates the extent to which people accept interruption requests from a computer agent. The context for the study is a game in which the agent can ask people for information that may benefit its individual performance and thus the group’s collaboration. This study revealed the key factors affecting people’s perception of the benefit of interruptions in this setting. The paper also describes the use of machine learning to predict the situations in which people deviate from the strategies generated by the algorithm, using a combination of domain features and features informed by the algorithm. The methodology followed in this work could form the basis for designing agents that effectively exchange information in collaborations with people.