Results 1 - 10
of
47
Programmable reinforcement learning agents
, 2001
"... We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows f ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn’t specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1
Cooperative Multi-Agent Learning: The State of the Art
- Autonomous Agents and Multi-Agent Systems
, 2005
"... Cooperative multi-agent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
Cooperative multi-agent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multi-agent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multi-agent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multi-agent learning problem domains, and a list of multi-agent learning resources. 1
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the single-state case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decision-making analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decision-making tasks. We introduce different model-free reinforcementlearning techniques, unitedly called Sparse Cooperative Q-learning, which approximate the global action-value function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edge-based decomposition of the action-value function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcement-learning methods based on temporal differences.
Hierarchical Multiagent Reinforcement Learning
, 2004
"... In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multiagent tasks. We introduce a hierarchical multiagent reinforcement learning (RL) framework and propose a hierarchical multiagent RL algorithm called Cooperative HRL. In o ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multiagent tasks. We introduce a hierarchical multiagent reinforcement learning (RL) framework and propose a hierarchical multiagent RL algorithm called Cooperative HRL. In our approach, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. Since coordination at high levels allows for increased cooperation skills as agents do not get confused by low-level details, we usually define cooperative subtasks at the high levels of the hierarchy. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn
Concurrent Hierarchical Reinforcement Learning
- IN PROCEEDINGS IJCAI-2005
, 2005
"... We consider applying hierarchical reinforcement learning techniques to problems in which an agent has several effectors to control simultaneously. We argue that the kind of prior knowledge one typically has about such problems is best expressed using a multithreaded partial program, and present conc ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
We consider applying hierarchical reinforcement learning techniques to problems in which an agent has several effectors to control simultaneously. We argue that the kind of prior knowledge one typically has about such problems is best expressed using a multithreaded partial program, and present concurrent ALisp, a language for specifying such partial programs. We describe algorithms for learning and acting with concurrent ALisp that can be efficient even when there are exponentially many joint choices at each decision point. Finally, we show results of applying these methods to a complex computer game domain.
Sparse Cooperative Q-learning
- Proceedings of the International Conference on Machine Learning
, 2004
"... Learning in multiagent systems suffers from the fact that both the state and the action space scale exponentially with the number of agents. In this paper we are interested in using Q-learning to learn the coordinated actions of a group of cooperative agents, using a sparse representation of the joi ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Learning in multiagent systems suffers from the fact that both the state and the action space scale exponentially with the number of agents. In this paper we are interested in using Q-learning to learn the coordinated actions of a group of cooperative agents, using a sparse representation of the joint stateaction space of the agents. We first examine a compact representation in which the agents need to explicitly coordinate their actions only in a predefined set of states. Next, we use a coordination-graph approach in which we represent the Q-values by value rules that specify the coordination dependencies of the agents at particular states. We show how Q-learning can be efficiently applied to learn a coordinated policy for the agents in the above framework. We demonstrate the proposed method on the predator-prey domain, and we compare it with other related multiagent Q-learning methods.
Least-squares methods in reinforcement learning for control
- In SETN ’02: Proceedings of the Second Hellenic Conference on AI
, 2002
"... Abstract. Least-squares methods have been successfully used for prediction problems in the context of reinforcement learning, but little has been done in extending these methods to control problems. This paper presents an overview of our research efforts in using least-squares techniques for control ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Abstract. Least-squares methods have been successfully used for prediction problems in the context of reinforcement learning, but little has been done in extending these methods to control problems. This paper presents an overview of our research efforts in using least-squares techniques for control. In our early attempts, we considered a direct extension of the Least-Squares Temporal Difference (LSTD) algorithm in the spirit of Q-learning. Later, an effort to remedy some limitations of this algorithm (approximation bias, poor sample utilization) led to the Least-Squares Policy Iteration (LSPI) algorithm, which is a form of model-free approximate policy iteration and makes efficient use of training samples collected in any arbitrary manner. The algorithms are demonstrated on a variety of learning domains, including algorithm selection, inverted pendulum balancing, bicycle balancing and riding, multiagent learning in factored domains, and, recently, on two-player zero-sum Markov games and the game of Tetris. 1
Utile coordination: Learning interdependencies among cooperative agents
- In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG’05
, 2005
"... a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state repres ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state representation because only necessary coordination is modeled. We apply our method within the framework of coordination graphs in which value rules represent the coordination dependencies between the agents for a specific context. The algorithm is first applied on a small illustrative problem, and next on a large predator-prey problem in which two predators have to capture a single prey. 1
Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains
"... Policy gradient approaches are a powerful instrument for learning how to interact with the environment. Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult – if not impossible – to apply them within structured domains, ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Policy gradient approaches are a powerful instrument for learning how to interact with the environment. Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult – if not impossible – to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach – called NPPG – that overcomes this limitation. The key idea is to apply Friedmann’s gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results. 1.
Online Resource Allocation Using Decompositional Reinforcement Learning
- Proc. AAAI-05
, 2005
"... been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P.

