Results 1 - 10
of
514
A domain-independent framework for modeling emotion
- Journal of Cognitive Systems Research
, 2004
"... The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions. – Marvin Minsky, (Minsky, 1986) p. 163 In every art form it is the emotional content that makes the difference between mere technical skill and true art. ..."
Abstract
-
Cited by 259 (31 self)
- Add to MetaCart
(Show Context)
The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions. – Marvin Minsky, (Minsky, 1986) p. 163 In every art form it is the emotional content that makes the difference between mere technical skill and true art.
The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2002
"... Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimality-complexity tradeoffs, it is impossible to determine whether the assumptions and app ..."
Abstract
-
Cited by 233 (21 self)
- Add to MetaCart
(Show Context)
Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimality-complexity tradeoffs, it is impossible to determine whether the assumptions and approximations made by a particular theory gain enough efficiency to justify the losses in overall performance. To provide a tool for use by multiagent researchers in evaluating this tradeoff, we present a unified framework, the COMmunicative Multiagent Team Decision Problem (COM-MTDP). The COM-MTDP model combines and extends existing multiagent theories, such as decentralized partially observable Markov decision processes and economic team theory. In addition to their generality of representation, COM-MTDPs also support the analysis of both the optimality of team performance and the computational complexity of the agents' decision problem. In analyzing complexity, we present a breakdown of the computational complexity of constructing optimal teams under various classes of problem domains, along the dimensions of observability and communication cost. In analyzing optimality, we exploit the COM-MTDP's ability to encode existing teamwork theories and models to encode two instantiations of joint intentions theory taken from the literature. Furthermore, the COM-MTDP model provides a basis for the development of novel team coordination algorithms. We derive a domain-independent criterion for optimal communication and provide a comparative analysis of the two joint intentions instantiations with respect to this optimal policy. We have implemented a reusable, domain-independent software package based on COM-MTDPs to analyze teamwork coordination strategies, and we demons...
Value-function approximations for partially observable Markov decision processes
- Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract
-
Cited by 167 (1 self)
- Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
Reinforcement learning for RoboCup-soccer keepaway
- Adaptive Behavior
, 2005
"... 1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, in-cluding a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our appli-cation of episodic SMD ..."
Abstract
-
Cited by 134 (36 self)
- Add to MetaCart
(Show Context)
1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, in-cluding a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our appli-cation of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, “the keepers, ” tries to keep control of the ball for as long as possible despite the efforts of “the takers. ” The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.
A framework for sequential planning in multi-agent settings
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2005
"... This paper extends the framework of partially observable Markov decision processes (POMDPs) to multi-agent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian ..."
Abstract
-
Cited by 130 (33 self)
- Add to MetaCart
This paper extends the framework of partially observable Markov decision processes (POMDPs) to multi-agent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian update to maintain their beliefs over time. The solutions map belief states to actions. Models of other agents may include their belief states and are related to agent types considered in games of incomplete information. We express the agents ’ autonomy by postulating that their models are not directly manipulable or observable by other agents. We show that important properties of POMDPs, such as convergence of value iteration, the rate of convergence, and piece-wise linearity and convexity of the value functions carry over to our framework. Our approach complements a more traditional approach to interactive settings which uses Nash equilibria as a solution paradigm. We seek to avoid some of the drawbacks of equilibria which may be non-unique and are not able to capture off-equilibrium behaviors. We do so at the cost of having to represent, process and continually revise models of other agents. Since the agent’s beliefs may be arbitrarily nested the optimal solutions to decision making problems are only asymptotically computable. However, approximate belief updates and approximately optimal plans are computable. We illustrate our framework using a simple application domain, and we show examples of belief updates and value functions.
Scaling Reinforcement Learning toward RoboCup Soccer
, 2001
"... RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the eects of actions. We describe our application of episodic SMDP Sarsa() with linear tile-coding funct ..."
Abstract
-
Cited by 120 (23 self)
- Add to MetaCart
RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the eects of actions. We describe our application of episodic SMDP Sarsa() with linear tile-coding function approximation and variable to learning higherlevel decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, \the keepers," tries to keep control of the ball for as long as possible despite the eorts of \the takers." The keepers learn individually when to hold the ball and when to pass to a teammate, while the takers learn when to charge the ball-holder and when to cover possible passing lanes. Our agents learned policies that signi cantly out-performed a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including dierent eld sizes and dierent numbers of players on each team.
Anytime point-based approximations for large pomdps
- Journal of Artificial Intelligence Research
, 2006
"... The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known tech ..."
Abstract
-
Cited by 104 (7 self)
- Add to MetaCart
(Show Context)
The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.
Background to Qualitative Decision Theory
- AI MAGAZINE
, 1999
"... This paper provides an overview of the field of qualitative decision theory: its motivating tasks and issues, its antecedents, and its prospects. Qualitative decision theory studies qualitative approaches to problems of decision making and their sound and effective reconciliation and integration ..."
Abstract
-
Cited by 95 (4 self)
- Add to MetaCart
This paper provides an overview of the field of qualitative decision theory: its motivating tasks and issues, its antecedents, and its prospects. Qualitative decision theory studies qualitative approaches to problems of decision making and their sound and effective reconciliation and integration with quantitative approaches. Though it inherits from a long tradition, the field offers a new focus on a number of important unanswered questions of common concern to artificial intelligence, economics, law, psychology, and management.
Discovering hierarchy in reinforcement learning with hexq
- In Nineteenth International Conference on Machine Learning
, 2002
"... An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a model-free factored MDP hierarchically is described. By searching for aliased Markov sub-space regions based on the state variables the algorithm ..."
Abstract
-
Cited by 94 (5 self)
- Add to MetaCart
(Show Context)
An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a model-free factored MDP hierarchically is described. By searching for aliased Markov sub-space regions based on the state variables the algorithm uses temporal and state abstraction to construct a hierarchy of interlinked smaller MDPs. 1.
Proto-value functions: A laplacian framework for learning representation and control in markov decision processes
- Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract
-
Cited by 92 (10 self)
- Add to MetaCart
(Show Context)
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called proto-value functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A three-phased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using least-squares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for out-of-sample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.