Results 1 - 10
of
245
The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models
- Journal of Artificial Intelligence Research
, 2002
"... Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimality-complexity tradeoffs, it is impossible to determine whether the assumptions and app ..."
Abstract
-
Cited by 147 (18 self)
- Add to MetaCart
Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimality-complexity tradeoffs, it is impossible to determine whether the assumptions and approximations made by a particular theory gain enough efficiency to justify the losses in overall performance. To provide a tool for use by multiagent researchers in evaluating this tradeoff, we present a unified framework, the COMmunicative Multiagent Team Decision Problem (COM-MTDP). The COM-MTDP model combines and extends existing multiagent theories, such as decentralized partially observable Markov decision processes and economic team theory. In addition to their generality of representation, COM-MTDPs also support the analysis of both the optimality of team performance and the computational complexity of the agents' decision problem. In analyzing complexity, we present a breakdown of the computational complexity of constructing optimal teams under various classes of problem domains, along the dimensions of observability and communication cost. In analyzing optimality, we exploit the COM-MTDP's ability to encode existing teamwork theories and models to encode two instantiations of joint intentions theory taken from the literature. Furthermore, the COM-MTDP model provides a basis for the development of novel team coordination algorithms. We derive a domain-independent criterion for optimal communication and provide a comparative analysis of the two joint intentions instantiations with respect to this optimal policy. We have implemented a reusable, domain-independent software package based on COM-MTDPs to analyze teamwork coordination strategies, and we demons...
A domain-independent framework for modeling emotion
- Journal of Cognitive Systems Research
, 2004
"... The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions. – Marvin Minsky, (Minsky, 1986) p. 163 In every art form it is the emotional content that makes the difference between mere technical skill and true art. ..."
Abstract
-
Cited by 124 (15 self)
- Add to MetaCart
The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions. – Marvin Minsky, (Minsky, 1986) p. 163 In every art form it is the emotional content that makes the difference between mere technical skill and true art.
Value-function approximations for partially observable Markov decision processes
- Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract
-
Cited by 105 (0 self)
- Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
Scaling Reinforcement Learning toward RoboCup Soccer
, 2001
"... RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the eects of actions. We describe our application of episodic SMDP Sarsa() with linear tile-coding funct ..."
Abstract
-
Cited by 89 (17 self)
- Add to MetaCart
RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the eects of actions. We describe our application of episodic SMDP Sarsa() with linear tile-coding function approximation and variable to learning higherlevel decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, \the keepers," tries to keep control of the ball for as long as possible despite the eorts of \the takers." The keepers learn individually when to hold the ball and when to pass to a teammate, while the takers learn when to charge the ball-holder and when to cover possible passing lanes. Our agents learned policies that signi cantly out-performed a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including dierent eld sizes and dierent numbers of players on each team.
Reinforcement learning for RoboCup-soccer keepaway
- Adaptive Behavior
, 2005
"... 1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, in-cluding a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our appli-cation of episodic SMD ..."
Abstract
-
Cited by 85 (31 self)
- Add to MetaCart
1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, in-cluding a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our appli-cation of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, “the keepers, ” tries to keep control of the ball for as long as possible despite the efforts of “the takers. ” The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.
Discovering Hierarchy in Reinforcement Learning with HEXQ
- In Maching Learning: Proceedings of the Nineteenth International Conference on Machine Learning
, 2002
"... An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a model-free factored MDP hierarchically is described. By searching for aliased Markov sub-space regions based on the state variables the algo ..."
Abstract
-
Cited by 65 (4 self)
- Add to MetaCart
An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a model-free factored MDP hierarchically is described. By searching for aliased Markov sub-space regions based on the state variables the algorithm uses temporal and state abstraction to construct a hierarchy of interlinked smaller MDPs.
A Survey of Research in Distributed, Continual Planning
, 2000
"... Complex, real-world domains require a rethinking of traditional approaches to AI planning. Planning and executing the resulting plans in a dynamic environment requires a continual approachinwhich planning and execution are interleaved, there may be uncertaintyin the current and projected world ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
Complex, real-world domains require a rethinking of traditional approaches to AI planning. Planning and executing the resulting plans in a dynamic environment requires a continual approachinwhich planning and execution are interleaved, there may be uncertaintyin the current and projected world state, and replanning may be required when the situation changes or planned actions fail. Furthermore, complex planning and execution problems may require multiple computational agents and human planners to collaborate on a solution. In this article, we describe a new paradigm for planning in complex, dynamic environments, whichweterm distributed,continual planning (DCP). We argue that developing DCP systems will be necessary in order for planning applications to be successful in these environments. We give a historical overview of research leading up to the current state of the art in DCP, and describe research in distributed and continual planning. The increasing emphasis on r...
Temporal Abstraction in Reinforcement Learning
, 2000
"... Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions regarding what means of transportation to use, but also chooses low-level actions, such as the moveme ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions regarding what means of transportation to use, but also chooses low-level actions, such as the movements for getting into a car. The problem of picking an appropriate time scale for reasoning and learning has been explored in artificial intelligence, control theory and robotics. In this dissertation we develop a framework that allows novel solutions to this problem, in the context of Markov Decision Processes (MDPs) and reinforcement learning. In this dissertation, we present a general framework for prediction, control and learning at multipl...
Contingent Planning Under Uncertainty via Stochastic Satisfiability
- Artificial Intelligence
, 1999
"... We describe two new probabilistic planning techniques ---c-maxplan and zander---that generate contingent plans in probabilistic propositional domains. Both operate by transforming the planning problem into a stochastic satisfiability problem and solving that problem instead. c-maxplan encodes t ..."
Abstract
-
Cited by 49 (5 self)
- Add to MetaCart
We describe two new probabilistic planning techniques ---c-maxplan and zander---that generate contingent plans in probabilistic propositional domains. Both operate by transforming the planning problem into a stochastic satisfiability problem and solving that problem instead. c-maxplan encodes the problem as an E-Majsat instance, while zander encodes the problem as an S-Sat instance. Although S-Sat problems are in a higher complexity class than E-Majsat problems, the problem encodings produced by zander are substantially more compact and appear to be easier to solve than the corresponding E-Majsat encodings. Preliminary results for zander indicate that it is competitive with existing planners on a variety of problems. Introduction When planning under uncertainty, any information about the state of the world is precious. A contingent plan is one that can make action choices contingent on such information. In this paper, we present an implemented framework for contingent pl...
Dynamic Programming for POMDPs using a Factored State Representation
- In Proceedings of the Fifth International Conference on AI Planning Systems
, 2000
"... Contingent planning -- constructing a plan in which action selection is contingent on imperfect information received during plan execution -- can be formalized as the problem of solving a partially observable Markov decision process (POMDP). Traditional dynamic programming algorithms for POMDPs ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
Contingent planning -- constructing a plan in which action selection is contingent on imperfect information received during plan execution -- can be formalized as the problem of solving a partially observable Markov decision process (POMDP). Traditional dynamic programming algorithms for POMDPs use a flat state representation that enumerates all possible states and state transitions. By contrast, AI planning algorithms use a factored state representation that supports state abstraction and allows problems with large state spaces to be represented and solved more efficiently. Boutilier and Poole (1996) have recently described how a factored state representation can be exploited by a dynamic programming algorithm for POMDPs. We extend their framework, describe an implementation and test its performance, and assess how much this approach improves the computational efficiency of dynamic programming for POMDPs. Introduction Many AI planning researchers have adopted Markov...

