Results 11 - 20
of
47
Learning to communicate and act using hierarchical reinforcement learning
- In AAMAS-2004 — Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems
, 2004
"... In this paper, we address the issue of rational communication behavior among autonomous agents. The goal is for agents to learn a policy to optimize the communication needed for proper coordination, given the communication cost. We extend our previously reported cooperative hierarchical reinforcemen ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In this paper, we address the issue of rational communication behavior among autonomous agents. The goal is for agents to learn a policy to optimize the communication needed for proper coordination, given the communication cost. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decisions and propose a new multiagent HRL algorithm, called COM-Cooperative HRL. In this algorithm, we define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. Coordination skills among agents are learned faster by sharing information at the cooperation levels, rather than the level of primitive actions. We add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before making a decision at a cooperative subtask, agents decide if it is worthwhile to perform a communication action. A communication action has a certain cost and provides each agent at a certain cooperation level with the actions selected by the other agents at the same level. We demonstrate the efficacy of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multiagent taxi domain. 1.
Learning to Communicate and Act in Cooperative Multiagent Systems Using Hierarchical Reinforcement Learning
"... In this paper, we address the issue of rational communication behavior among autonomous agents. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decision and propose a new multiagent HRL algorithm, called COM-Cooperative HRL. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, we address the issue of rational communication behavior among autonomous agents. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decision and propose a new multiagent HRL algorithm, called COM-Cooperative HRL. In this algorithm, at specific levels of the hierarchy, called cooperation levels, a group of subtasks, in which coordination among agents has significant effect on the performance of the overall task, are defined as cooperative subtasks. Coordination skills among agents are learned faster by sharing information at cooperation levels, rather than the level of primitive actions. We add a communication level to the hierarchical decomposition of the problem, below each cooperation level. A communication action has a certain cost and is used by each agent to obtain the actions selected by the cooperative subtasks of the other agents. Before making a decision at a cooperative subtask, agents decide if it is worthwhile to perform a communication action in order to acquire the actions chosen by the cooperative subtasks of the other agents. Using this algorithm, agents learn a policy to balance the amount of communication needed for proper coordination, and communication cost. We demonstrate the efficacy of the COM-Cooperative HRL algorithm as well as the relation between communication cost and the learned communication policy, using a multiagent taxi domain.
Autonomic Multi-Agent Management of Power and Performance in Data Centers
"... The rapidly rising cost and environmental impact of energy consumption in data centers has become a multi-billion dollar concern globally. In response, the IT Industry is actively engaged in a first-to-market race to develop energyconserving hardware and software solutions that do not sacrifice perf ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The rapidly rising cost and environmental impact of energy consumption in data centers has become a multi-billion dollar concern globally. In response, the IT Industry is actively engaged in a first-to-market race to develop energyconserving hardware and software solutions that do not sacrifice performance objectives. In this work we demonstrate a prototype of an integrated data center power management solution that employs server management tools, appropriate sensors and monitors, and an agent-based approach to achieve specified power and performance objectives. By intelligently turning off servers under low-load conditions, we can achieve over 25 % power savings over the unmanaged case without incurring SLA penalties for typical daily and weekly periodic demands seen in webserver farms. Categories and Subject Descriptors D.4.8 [Software]: Performance—measurements, modeling and prediction, operational analysis General Terms Data center, power measurement, multicriteria utility functions, policy-based management
Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery
"... Abstract. Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce “tabular linear functions ” t ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Abstract. Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce “tabular linear functions ” that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algorithm called ASH-learning, which is an afterstate version of H-Learning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery- an optimization problem that combines inventory control and vehicle routing. 1
QUICR-learning for multi-agent coordination
- In Proceedings of the 21st National Conference on Artificial Intelligence
, 2006
"... Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t ′> t. Second, credit must be a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t ′> t. Second, credit must be assigned for the contribution of agent i to the overall system performance. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning. The second credit assignment problem is typically addressed by creating custom reward functions. To address both credit assignment problems simultaneously, we propose the “Q Updates with Immediate Counterfactual Rewards-learning” (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. QUICR-learning is based on previous work on single-time-step counterfactual rewards described by the collectives framework. Results on a traffic congestion problem shows that QUICR-learning is significantly better than a Q-learner using collectives-based (single-time-step counterfactual) rewards. In addition QUICR-learning provides significant gains over conventional and local Q-learning. Additional results on a multi-agent grid-world problem show that the improvements due to QUICR-learning are not domain specific and can provide up to a ten fold increase in performance over existing methods.
Reinforcing Reachable Routes
, 2003
"... This paper studies the evaluation of routing algorithms from the perspective of reachability routing, where the goal is to determine all paths between a sender and a receiver. Reachability routing is becoming relevant with the changing dynamics of the Internet and the emergence of low-bandwidth wi ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper studies the evaluation of routing algorithms from the perspective of reachability routing, where the goal is to determine all paths between a sender and a receiver. Reachability routing is becoming relevant with the changing dynamics of the Internet and the emergence of low-bandwidth wireless/ad-hoc networks. We make the case for reinforcement learning as the framework of choice to realize reachability routing, within the confines of the current Internet infrastructure. The setting of the reinforcement learning problem offers several advantages, including loop resolution, multi-path forwarding capability, cost-sensitive routing, and minimizing state overhead, while maintaining the incremental spirit of current backbone routing algorithms. We identify research issues in reinforcement learning applied to the reachability routing problem to achieve a fluid and robust backbone routing framework. This paper also presents the design, implementation and evaluation of a new reachability routing algorithm that uses a model-based approach to achieve cost-sensitive multi-path forwarding; performance assessment of the algorithm in various troublesome topologies shows consistently superior performance over classical reinforcement learning algorithms. The paper is targeted toward practitioners seeking to implement a reachability routing algorithm.
Coarticulation: An approach for generating concurrent plans in markov decision processes
- In Proceedings of the 22nd International Conference on Machine Learning (ICML-2005
, 2005
"... We study an approach for performing concurrent activities in Markov decision processes (MDPs) based on the coarticulation framework. We assume that the agent has multiple degrees of freedom (DOF) in the action space which enables it to perform activities simultaneously. We demonstrate that one natur ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We study an approach for performing concurrent activities in Markov decision processes (MDPs) based on the coarticulation framework. We assume that the agent has multiple degrees of freedom (DOF) in the action space which enables it to perform activities simultaneously. We demonstrate that one natural way for generating concurrency in the system is by coarticulating among the set of learned activities available to the agent. In general due to the multiple DOF in the system, often there exists a redundant set of admissible sub-optimal policies associated with each learned activity. Such flexibility enables the agent to concurrently commit to several subgoals according to their priority levels, given a new task defined in terms of a set of prioritized subgoals. We present efficient approximate algorithms for computing such policies and for generating concurrent plans. We also evaluate our approach in a simulated domain. 1.
Conditional Random Fields for Multi-agent Reinforcement Learning
"... Conditional random fields (CRFs) are graphical models for modeling the probability of labels given the observations. They have traditionally been trained with using a set of observation and label pairs. Underlying all CRFs is the assumption that, conditioned on the training data, the labels are inde ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Conditional random fields (CRFs) are graphical models for modeling the probability of labels given the observations. They have traditionally been trained with using a set of observation and label pairs. Underlying all CRFs is the assumption that, conditioned on the training data, the labels are independent and identically distributed (iid). In this paper we explore the use of CRFs in a class of temporal learning algorithms, namely policygradient reinforcement learning (RL). Now the labels are no longer iid. They are actions that update the environment and affect the next observation. From an RL point of view, CRFs provide a natural way to model joint actions in a decentralized Markov decision process. They define how agents can communicate with each other to choose the optimal joint action. Our experiments include a synthetic network alignment problem, a distributed sensor network, and road traffic control; clearly outperforming RL methods which do not model the proper joint policy. 1.
Learning Complementary Multiagent Behaviors: A Case
"... Abstract. As machine learning is applied to increasingly complex tasks, it is likely that the diverse challenges encountered can only be addressed by combining the strengths of different learning algorithms. We examine this aspect of learning through a case study grounded in the robot soccer context ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. As machine learning is applied to increasingly complex tasks, it is likely that the diverse challenges encountered can only be addressed by combining the strengths of different learning algorithms. We examine this aspect of learning through a case study grounded in the robot soccer context. The task we consider is Keepaway, a popular benchmark for multiagent reinforcement learning from the simulation soccer domain. Whereas previous successful results in Keepaway have limited learning to an isolated, infrequent decision that amounts to a turn-taking behavior (passing), we expand the agents ’ learning capability to include a much more ubiquitous action (moving without the ball, or getting open), such that at any given time, multiple agents are executing learned behaviors simultaneously. We introduce a policy search method for learning “GetOpen ” to complement the temporal difference learning approach employed for learning “Pass”. Empirical results indicate that the learned GetOpen policy matches the best hand-coded policy for this task, and outperforms the best policy found when Pass is learned. We demonstrate that Pass and GetOpen can be learned simultaneously to realize tightly-coupled soccer team behavior. 1

