Results 1 - 10
of
13
Recent advances in hierarchical reinforcement learning
, 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract
-
Cited by 119 (18 self)
- Add to MetaCart
A preliminary unedited version of this paper was incorrectly published as part of Volume
Cooperative Multi-Agent Learning: The State of the Art
- Autonomous Agents and Multi-Agent Systems
, 2005
"... Cooperative multi-agent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
Cooperative multi-agent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multi-agent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multi-agent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multi-agent learning problem domains, and a list of multi-agent learning resources. 1
Learning and Value Function Approximation in Complex Decision Processes
, 1998
"... In principle, a wide variety of sequential decision problems -- ranging from dynamic resource allocation in telecommunication networks to financial risk management -- can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Such algorithms compute and sto ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
In principle, a wide variety of sequential decision problems -- ranging from dynamic resource allocation in telecommunication networks to financial risk management -- can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Such algorithms compute and store a value function, which evaluates expected future reward as a function of current state. Unfortunately, exact computation of the value function typically requires time and storage that grow proportionately with the number of states, and consequently, the enormous state spaces that arise in practical applications render the algorithms intractable. In this thesis, we study tractable methods that approximate the value function. Our work builds on research in an area of artificial intelligence known as reinforcement learning. A point of focus of this thesis is temporal-difference learning -- a stochastic algorithm inspired to some extent by phenomena observed in animal behavior. Given a selection of...
A Survey of POMDP Applications
"... An increasing number of researchers in many areas are becoming interested in the application of the partially observable Markov decision process (POMDP) model to problems with hidden state. This model can account for both state transition and observation uncertainty. The majority of recent research ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
An increasing number of researchers in many areas are becoming interested in the application of the partially observable Markov decision process (POMDP) model to problems with hidden state. This model can account for both state transition and observation uncertainty. The majority of recent research interest in the pomdp model has been in the artificial intelligence community and as such, has been applied in a limited range of domains. The main purpose of this paper is show the wider applicability of the model by way of surveying the potential application areas for POMDPs.
Rule Extraction: Where Do We Go from Here?
, 1999
"... We argue that despite being an actively researched area for nearly a decade, rule-extraction technology has not made as significant of an impact as it should have. A confluence of trends, however, has made the ability to extract comprehensible descriptions from complex learned models more important ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We argue that despite being an actively researched area for nearly a decade, rule-extraction technology has not made as significant of an impact as it should have. A confluence of trends, however, has made the ability to extract comprehensible descriptions from complex learned models more important now than ever. We argue that rule-extraction methods can have a significant impact in the overlapping data-mining, machinelearning and neural-network communities if research is focused on several commonly overlooked issues. We then briefly describe how we have tried to address these issues in our own work. Introduction For nearly a decade, researchers have been investigating the task of converting learned neural-network models into more easily understood representations (Andrews, Diederich, & Tickle 1995). This type of work is commonly referred to as rule extraction since the representation language used to describe learned neuralnet models by these methods is typically some form of proposi...
Decision Boundary Partitioning: Variable Resolution Model-Free Reinforcement Learning
, 1999
"... Reinforcement learning agents attempt to learn and construct a decision policy which maximises some reward signal. In turn, this policy is directly derived from long-term value estimates of state-action pairs. In environments with real-valued state-spaces, however, it is impossible to enumerate the ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Reinforcement learning agents attempt to learn and construct a decision policy which maximises some reward signal. In turn, this policy is directly derived from long-term value estimates of state-action pairs. In environments with real-valued state-spaces, however, it is impossible to enumerate the value of every state-action pair, necessitating the use of a function approximator in order to infer state-action values from similar states. Typically, function approximators require many parameters for which suitable values may be difficult to determine a-priori. Traditional systems of this kind are also then bound to the fixed limits imposed by the initial parameters, beyond which no further improvements are possible. This paper introduces a new method to adaptively increase the resolution of a discretised action-value function based upon which regions of the state-space are most important for the purposes of choosing an action. The method is motivated by similar work by Moor...
The Stability of General Discounted Reinforcement Learning with Linear Function Approximation
- In Proceedings of the UK Workshop on Computational Intelligence (UKCI-02
, 2002
"... This paper shows that general discounted return estimating reinforcement learning algorithms cannot diverge to infinity when a form of linear function approximator is used for approximating the value-function or Q-function. The results are significant insofar as examples of divergence of the value-f ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper shows that general discounted return estimating reinforcement learning algorithms cannot diverge to infinity when a form of linear function approximator is used for approximating the value-function or Q-function. The results are significant insofar as examples of divergence of the value-function exist where similar linear function approximators are trained using a similar incremental gradient descent rule. A different gradient descent error criterion is used to produce a training rule which has a non-expansion property and therefore cannot possibly diverge.
A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game
, 2005
"... We formulate an automatic strategy acquisition problem for the multi-agent card game “Hearts” as a reinforcement learning problem. The problem can approximately be dealt with in the framework of a partially observable Markov decision process (POMDP) for a single-agent system. Hearts is an example o ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We formulate an automatic strategy acquisition problem for the multi-agent card game “Hearts” as a reinforcement learning problem. The problem can approximately be dealt with in the framework of a partially observable Markov decision process (POMDP) for a single-agent system. Hearts is an example of imperfect information games, which are more difficult to deal with than perfect information games. A POMDP is a decision problem that includes a process for estimating unobservable state variables. By regarding missing information as unobservable state variables, an imperfect information game can be formulated as a POMDP. However, the game of Hearts is a realistic problem that has a huge number of possible states, even when it is approximated as a singleagent system. Therefore, further approximation is necessary to make the strategy acquisition problem tractable. This article presents an approximation method based on estimating unobservable state variables and predicting the actions of the other agents. Simulation results show that our reinforcement learning method is applicable to such a difficult multi-agent problem.
Learning Situation-Specific Control In Multi-Agent Systems
, 1997
"... The work presented in this thesis deals with techniques to improve problem solving control skills of cooperative agents through machine learning. In a multi-agent system, the local problem solving control of an agent can interact in complex and intricate ways with the problem solving control of ot ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The work presented in this thesis deals with techniques to improve problem solving control skills of cooperative agents through machine learning. In a multi-agent system, the local problem solving control of an agent can interact in complex and intricate ways with the problem solving control of other agents. In such systems, an agent cannot make effective control decisions based purely on its local problem solving state. Effective cooperation requires that the global problem-solving state influence the local control decisions made by an agent. We call such an influence cooperative control. An agent with a purely local view of the problem solving situation cannot learn ...
Title of the Book!
"... Neuro--dynamic programming is comprised of algorithms for solving large-- scale stochastic control problems. Many ideas underlying these algorithms originated in the field of artificial intelligence and were motivated to some extent by descriptive models of animal behavior. This chapter provides an ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Neuro--dynamic programming is comprised of algorithms for solving large-- scale stochastic control problems. Many ideas underlying these algorithms originated in the field of artificial intelligence and were motivated to some extent by descriptive models of animal behavior. This chapter provides an overview of the history and state--of--the--art in neuro--dynamic programming, as well as a review of recent results involving two classes of algorithms that have been the subject of much recent research activity: temporal-- di#erence learning and actor--critic methods.

