• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Heuristic Selection of Actions in Multiagent Reinforcement Learning ∗

by Reinaldo A. C. Bianchi, Carlos H. C. Ribeiro
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 13
Next 10 →

Integrating Organizational Control into Multi-Agent Learning

by Chongjie Zhang, Sherief Abdallah, Victor Lesser
"... Multi-Agent Reinforcement Learning (MARL) algorithms suffer from slow convergence and even divergence, especially in largescale systems. In this work, we develop an organization-based control framework to speed up the convergence of MARL algorithms in a network of agents. Our framework defines a mul ..."
Abstract - Cited by 11 (3 self) - Add to MetaCart
Multi-Agent Reinforcement Learning (MARL) algorithms suffer from slow convergence and even divergence, especially in largescale systems. In this work, we develop an organization-based control framework to speed up the convergence of MARL algorithms in a network of agents. Our framework defines a multi-level organizational structure for automated supervision and a communication protocol for exchanging information between lower-level agents and higher-level supervising agents. The abstracted states of lower-level agents travel upwards so that higher-level supervising agents generate a broader view of the state of the network. This broader view is used in creating supervisory information which is passed down the hierarchy. The supervisory policy adaptation then integrates supervisory information into existing MARL algorithms, guiding agents ’ exploration of their state-action space. The generality of our framework is verified by its applications on different domains (distributed task allocation and network routing) with different MARL algorithms. Experimental results show that our framework improves both the speed and likelihood of MARL convergence.
(Show Context)

Citation Context

...t action-values by communicating with others only the state of high-level subtasks. The second paradigm is to employ heuristics to guide the policy search. Heuristically Accelerated Minimax-Q (HAMMQ) =-=[3]-=- incorporated heuristics into the Minimax-Q algorithm to speed up its convergence rate. HAMMQ shared the convergence property with Minimax-Q. However, HAMMQ was intended for use only in a two-agent co...

A general framework for interacting Bayes-optimally with self-interested agents using arbitrary parametric model and model prior. arXiv:1304.2024

by Trong Nghia Hoang, Kian Hsiang Low , 2013
"... Recent advances in Bayesian reinforcement learn-ing (BRL) have shown that Bayes-optimality is theoretically achievable by modeling the envi-ronment’s latent dynamics using Flat-Dirichlet-Multinomial (FDM) prior. In self-interested multi-agent environments, the transition dynamics are mainly controll ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
Recent advances in Bayesian reinforcement learn-ing (BRL) have shown that Bayes-optimality is theoretically achievable by modeling the envi-ronment’s latent dynamics using Flat-Dirichlet-Multinomial (FDM) prior. In self-interested multi-agent environments, the transition dynamics are mainly controlled by the other agent’s stochastic behavior for which FDM’s independence and mod-eling assumptions do not hold. As a result, FDM does not allow the other agent’s behavior to be generalized across different states nor specified us-ing prior domain knowledge. To overcome these practical limitations of FDM, we propose a gener-alization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners ’ domain knowledge to be exploited to produce a fine-grained and compact representation of the other agent’s behavior. Empirical evalua-tion shows that our approach outperforms existing multi-agent reinforcement learning algorithms. 1

Improving Reinforcement Learning by using Case Based Heuristics

by Reinaldo A. C. Bianchi, Raquel Ros, Ramón López De Mántaras
"... Abstract. This work presents a new approach that allows the use of cases in a case base as heuristics to speed up Reinforcement Learning algorithms, combining Case Based Reasoning (CBR) and Reinforcement Learning (RL) techniques. This approach, called Case Based Heuristically Accelerated Reinforceme ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract. This work presents a new approach that allows the use of cases in a case base as heuristics to speed up Reinforcement Learning algorithms, combining Case Based Reasoning (CBR) and Reinforcement Learning (RL) techniques. This approach, called Case Based Heuristically Accelerated Reinforcement Learning (CB-HARL), builds upon an emerging technique, the Heuristic Accelerated Reinforcement Learning (HARL), in which RL methods are accelerated by making use of heuristic information. CB-HARL is a subset of RL that makes use of a heuristic function derived from a case base, in a Case Based Reasoning manner. An algorithm that incorporates CBR techniques into the Heuristically Accelerated Q–Learning is also proposed. Empirical evaluations were conducted in a simulator for the RoboCup Four-Legged Soccer Competition, and results obtained shows that using CB-HARL, the agents learn faster than using either RL or HARL methods. 1

Transparent Modelling of Finite Stochastic Processes for Multiple Agents

by Luke Dickens, Krysia Broda, Ra Russo , 2008
"... Abstract. Stochastic Processes are ubiquitous, from automated engineering, through financial markets, to space exploration. These systems are typically highly dynamic, unpredictable and resistant to analytic methods; coupled with a need to orchestrate long control sequences which are both highly com ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Abstract. Stochastic Processes are ubiquitous, from automated engineering, through financial markets, to space exploration. These systems are typically highly dynamic, unpredictable and resistant to analytic methods; coupled with a need to orchestrate long control sequences which are both highly complex and uncertain. This report examines some existing single- and multi-agent modelling frameworks, details their strengths and weaknesses, and uses the experience to identify some fundamental tenets of good practice in modelling stochastic processes. It goes on to develop a new family of frameworks based on these tenets, which can model single- and multi-agent domains with equal clarity and flexibility, while remaining close enough to the existing frameworks that existing analytic and learning tools can be applied with little or no adaption. Some simple and larger examples illustrate the similarities and differences of this approach, and a discussion of the challenges inherent in developing more flexible tools to exploit these new frameworks concludes matters. 1
(Show Context)

Citation Context

...ent NoSDE example in Zinkevich et al.’s 2003 paper on cyclic equilibrium [50]. 6.2 Littman’s adversarial MDP Soccer Next, we introduce the soccer example originally proposed in [23], and revisited in =-=[4, 5, 7, 33, 47]-=-. We illustrate both the transparency of the modelling mechanisms and ultimately the descriptive power this gives us in the context of problems which attempt to recreate some properties of a real syst...

Coordination guided reinforcement learning.

by Qiangfeng Peter Lau , Li Mong , Lee , Wynne Hsu - In AAMAS, , 2012
"... ABSTRACT In this paper, we propose to guide reinforcement learning (RL) with expert coordination knowledge for multi-agent problems managed by a central controller. The aim is to learn to use expert coordination knowledge to restrict the joint action space and to direct exploration towards more pro ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
ABSTRACT In this paper, we propose to guide reinforcement learning (RL) with expert coordination knowledge for multi-agent problems managed by a central controller. The aim is to learn to use expert coordination knowledge to restrict the joint action space and to direct exploration towards more promising states, thereby improving the overall learning rate. We model such coordination knowledge as constraints and propose a two-level RL system that utilizes these constraints for online applications. Our declarative approach towards specifying coordination in multi-agent learning allows knowledge sharing between constraints and features (basis functions) for function approximation. Results on a soccer game and a tactical real-time strategy game show that coordination constraints improve the learning rate compared to using only unary constraints. The two-level RL system also outperforms existing single-level approach that utilizes joint action selection via coordination graphs.
(Show Context)

Citation Context

... zero communication multi-agent problems known as Markov games where the focus is on handling the non-stationary environment due to independent learning and the setting is mostly adversarial [22]. In =-=[3]-=-, heuristics can be provided to influence learning when the policy selects a maximal action. Their heuristics do not affect exploratory actions and are used in an adversarial setting with a much small...

Efficient Multi-Agent Reinforcement Learning through Automated Supervision (Short Paper)

by Chongjie Zhang, Sherief Abdallah, Victor Lesser
"... Multi-Agent Reinforcement Learning (MARL) algorithms suffer from slow convergence and even divergence, especially in large-scale systems. In this work, we develop a supervision framework to speed up the convergence of MARL algorithms in a network of agents. The framework defines an organizational st ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Multi-Agent Reinforcement Learning (MARL) algorithms suffer from slow convergence and even divergence, especially in large-scale systems. In this work, we develop a supervision framework to speed up the convergence of MARL algorithms in a network of agents. The framework defines an organizational structure for automated supervision and a communication protocol for exchanging information between lower-level agents and higher-level supervising agents. The abstracted states of lower-level agents travel upwards so that higher-level supervising agents generate a broader view of the state of the network. This broader view is used in creating supervisory information which is passed down the hierarchy. We present a generic extension to MARL algorithms that integrates supervisory information into the learning process, guiding agents ’ exploration of their stateaction space.
(Show Context)

Citation Context

...uristic used only the local information and the global heuristic used the information that was shared and required to be exactly the same among robots. The Heuristically Accelerated Minimax-Q (HAMMQ) =-=[4]-=- incorporated heuristics into the Minimax-Q algorithm to speed up its convergence rate, which shared the convergence property with Minimax-Q. HAMMQ was intended for a two-agent configuration and furth...

Learning to Act Stochastically

by Luke Dickens
"... This thesis examines reinforcement learning for stochastic control processes with single and multiple agents, where either the learning outcomes are stochastic policies or learning is perpetual and within the domain of stochastic policies. In this context, a policy is a strategy for processing envir ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
This thesis examines reinforcement learning for stochastic control processes with single and multiple agents, where either the learning outcomes are stochastic policies or learning is perpetual and within the domain of stochastic policies. In this context, a policy is a strategy for processing environmental outputs (called observations) and subsequently generating a response or input-signal to the environment (called actions). A stochastic policy gives a probability distribution over actions for each observed situation, and the thesis concentrates on finite sets of observations and actions. There is an exclusive focus on stochastic policies for two principle reasons: such policies have been relatively neglected in the existing literature, and they have been recognised to be especially important in the field of multi-agent reinforcement learning. For the latter reason, the thesis concerns itself primarily with solutions best suited to multi-agent domains. This restriction proves essential, since the topic is otherwise too broad to be covered in depth without losing some clarity and focus. The thesis is partitioned into 3 parts, with chapter of contextual information preceding the first part. Part 1, focuses on analytic and formal mathematical approaches

SCALING MULTI-AGENT LEARNING IN COMPLEX Environments

by Chongjie Zhang , 2011
"... ..."
Abstract - Add to MetaCart
Abstract not found

Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents

by Trong Nghia Hoang, Kian Hsiang Low
"... A key challenge in non-cooperative multi-agent systems is that of developing efficient planning algorithms for intelligent agents to interact and perform effectively among boundedly rational, selfinterested agents (e.g., humans). The practicality of existing works addressing this challenge is being ..."
Abstract - Add to MetaCart
A key challenge in non-cooperative multi-agent systems is that of developing efficient planning algorithms for intelligent agents to interact and perform effectively among boundedly rational, selfinterested agents (e.g., humans). The practicality of existing works addressing this challenge is being undermined due to either the restrictive assumptions of the other agents ’ behavior, the failure in accounting for their rationality, or the prohibitively expensive cost of modeling and predicting their intentions. To boost the practicality of research in this field, we investigate how intention prediction can be efficiently exploited and made practical in planning, thereby leading to efficient intention-aware planning frameworks capable of predicting the intentions of other agents and acting optimally with respect to their predicted intentions. We show that the performance losses incurred by the resulting planning policies are linearly bounded by the error of intention prediction. Empirical evaluations through a series of stochastic games demonstrate that our policies can achieve better and more robust performance than the state-of-the-art algorithms. 1

The Use of Cases as Heuristics to speed up Multiagent Reinforcement Learning

by Reinaldo A. C. Bianchi, Ramón López De Mántaras
"... This work presents a new approach that allows the use of cases in a case base as heuristics to speed up Multiagent Reinforcement Learning algorithms, combining Case Based Reasoning (CBR) and Multiagent Reinforcement Learning (MRL) techniques. This approach, called Case Based Heuristically Accelera ..."
Abstract - Add to MetaCart
This work presents a new approach that allows the use of cases in a case base as heuristics to speed up Multiagent Reinforcement Learning algorithms, combining Case Based Reasoning (CBR) and Multiagent Reinforcement Learning (MRL) techniques. This approach, called Case Based Heuristically Accelerated Multiagent Reinforcement Learning (CB-HAMRL), builds upon an emerging technique, Heuristic Accelerated Reinforcement Learning (HARL), in which RL methods are accelerated by making use of heuristic information. CB-HAMRL is a subset of MRL that makes use of a heuristic function H derived from a case base, in a Case Based Reasoning manner. An algorithm that incorporates CBR techniques into the Heuristically Accelerated Minimax–Q is also proposed and a set of empirical evaluations were conducted in a simulator for the robot soccer domain, comparing the three solutions for this problem: MRL, HAMRL and CB-HAMRL. Experimental results show that using CB-HAMRL, the agents learn faster than using RL or HAMRL methods.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University