Results 1 - 10
of
118
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
- Artificial Intelligence
, 1999
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We ..."
Abstract
-
Cited by 342 (22 self)
- Add to MetaCart
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options---closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning.
Sold!: Auction Methods for Multirobot Coordination
, 2002
"... The key to utilizing the potential of multirobot systems is cooperation. How can we achieve cooperation in systems composed of failure-prone autonomous robots operating in noisy, dynamic environments? In this paper, we present a novel method of dynamic task allocation for groups of such robots. We i ..."
Abstract
-
Cited by 193 (13 self)
- Add to MetaCart
The key to utilizing the potential of multirobot systems is cooperation. How can we achieve cooperation in systems composed of failure-prone autonomous robots operating in noisy, dynamic environments? In this paper, we present a novel method of dynamic task allocation for groups of such robots. We implemented and tested an auction-based task allocation system which we call MURDOCH, built upon a principled, resource centric, publish /subscribe communication model. A variant of the Contract Net Protocol, MURDOCH produces a distributed approximation to a global optimum of resource usage. We validated MURDOCH in two very different domains: a tightly coupled multirobot physical manipulation task and a loosely coupled multirobot experiment in long-term autonomy. The primary contribution of this paper is to show empirically that distributed negotiation mechanisms such as MURDOCH are viable and effective for coordinating physical multirobot systems.
Sensory-Motor Primitives as a Basis for Imitation: Linking Perception to Action and Biology to Robotics
- Imitation in Animals and Artifacts
, 2000
"... ing away from the specific coding of the spinal fields, the examples from neurobiology provide the framework for a motor control system based on a small number of additive primitives (or basis behaviors) sufficient for a rich output movement repertoire. Our previous work (Matari'c 1995, Matari'c 199 ..."
Abstract
-
Cited by 72 (17 self)
- Add to MetaCart
ing away from the specific coding of the spinal fields, the examples from neurobiology provide the framework for a motor control system based on a small number of additive primitives (or basis behaviors) sufficient for a rich output movement repertoire. Our previous work (Matari'c 1995, Matari'c 1997), inspired by the same biological results, has successfully applied the idea of basis behaviors to control of mobile robots 6 by fitting it directly into the modular behavior-based control paradigm. Applictions of schema theory (Arbib 1992) to behavior-based mobile robots (Arkin 1987) have employed a similar notion of composable behaviors, stemming from foundations in neuroscience (Arbib 1981, Arbib 1989). The idea of using such primitives for articulator control has been recently studied in robotics. Williamson (1996) and Marjanovi'c, Scassellati & Williamson (1996) developed a 6 DOF (degrees of freedom) robot arm controller. While in the biological and mobile robotics work primitives c...
Automated derivation of primitives for movement classification
- In Proc. of First IEEE-RAS International Conference on Humanoid Robots
, 2000
"... Abstract. We present a new method for representing human movement compactly, in terms of a linear superimposition of simpler movements termed primitives. This method is a part of a larger research project aimed at modeling motor control and imitation using the notion of perceptuo-motor primitives, a ..."
Abstract
-
Cited by 72 (8 self)
- Add to MetaCart
Abstract. We present a new method for representing human movement compactly, in terms of a linear superimposition of simpler movements termed primitives. This method is a part of a larger research project aimed at modeling motor control and imitation using the notion of perceptuo-motor primitives, a basis set of coupled perceptual and motor routines. In our model, the perceptual system is biased by the set of motor behaviors the agent can execute, so it automatically classifies observed movements into its executable repertoire. In this paper, we describe a method for automatically deriving a set of primitives directly from human movement data. We used data from a psychophysical experiment on human imitation to derive a set of primitives, and then used those primitives as a basis for superposition and sequencing to reconstruct the original movements. We performed principal component analysis on segments from these data, resulting in a set of basis vectors. Next we clustered in the space of projections of segments onto the eigenvectors, to obtain a set of frequently used movements. To validate the approach experimentally, we used the movement obtained by expanding the cluster points in terms of the eigenvectors as a sequence of via points to control a humanoid dynamic simulation. We also developed an error metric to measure the effectiveness of the process. 1
Intelligence by Design: Principles of Modularity and Coordination for Engineering Complex Adaptive Agents
, 2001
"... All intelligence relies on search --- for example, the search for an intelligent agent's next action. Search is only likely to succeed in resource-bounded agents if they have already been biased towards finding the right answer. In artificial agents, the primary source of bias is engineering. This d ..."
Abstract
-
Cited by 62 (21 self)
- Add to MetaCart
All intelligence relies on search --- for example, the search for an intelligent agent's next action. Search is only likely to succeed in resource-bounded agents if they have already been biased towards finding the right answer. In artificial agents, the primary source of bias is engineering. This dissertation
Multi-Robot Task Allocation: Analyzing the Complexity and Optimality of Key Architectures
- ICRA 2003
, 2003
"... Important theoretical aspects of multi-robot coordination mechanisms have, to date, been largely ignored. To address part of this negligence, we focus on the problem of multi-robot task allocation. We give a formal, domainindependent, statement of the problem and show it to be an instance of another ..."
Abstract
-
Cited by 62 (11 self)
- Add to MetaCart
Important theoretical aspects of multi-robot coordination mechanisms have, to date, been largely ignored. To address part of this negligence, we focus on the problem of multi-robot task allocation. We give a formal, domainindependent, statement of the problem and show it to be an instance of another, well-studied, optimization problem. In this light, we analyze several recently proposed approaches to multi-robot task allocation, describing their fundamental characteristics in such a way that they can be objectively studied, compared, and evaluated.
Coverage, Exploration and Deployment by a Mobile Robot and Communication Network
- Telecommunication Systems Journal, Special Issue on Wireless Sensor Networks
, 2003
"... We consider the problem of coverage and exploration of an unknown dynamic environment using a mobile robot(s). The environment is assumed to be large enough such that constant motion by the robot(s) is needed to cover the environment. We present an e#cient minimalist algorithm which assumes that ..."
Abstract
-
Cited by 56 (10 self)
- Add to MetaCart
We consider the problem of coverage and exploration of an unknown dynamic environment using a mobile robot(s). The environment is assumed to be large enough such that constant motion by the robot(s) is needed to cover the environment. We present an e#cient minimalist algorithm which assumes that global information is not available (neither a map, nor GPS). Our algorithm deploys a network of radio beacons which assists the robot(s) in coverage. This network is also used for navigation. The deployed network can also be used for applications other than coverage. Simulation experiments are presented which show the collaboration between the deployed network and mobile robot(s) for the tasks of coverage/exploration, network deployment and maintenance (repair), and mobile robot(s) recovery (homing behavior). We present a theoretical basis for our algorithm on graphs and show the results of the simulated scenario experiments.
Spreading Out: A Local Approach to Multi-robot Coverage
- in Proc. of 6th International Symposium on Distributed Autonomous Robotic Systems
, 2002
"... The problem of coverage without a priori global information about the environment is a key element of the general exploration problem. Applications vary from exploration of the Mars surface to the urban search and rescue (USAR) domain, where neither a map, nor a Global Positioning System (GPS) are a ..."
Abstract
-
Cited by 55 (9 self)
- Add to MetaCart
The problem of coverage without a priori global information about the environment is a key element of the general exploration problem. Applications vary from exploration of the Mars surface to the urban search and rescue (USAR) domain, where neither a map, nor a Global Positioning System (GPS) are available. We propose two algorithms for solving the 2D coverage problem using multiple mobile robots. The basic premise of both algorithms is that local dispersion is a natural way to achieve global coverage. Thus, both algorithms are based on local, mutually dispersive interaction between robots when they are within sensing range of each other. Simulations show that the proposed algorithms solve the problem to within 5-7% of the (manually generated) optimal solutions. We show that the nature of the interaction needed between robots is very simple; indeed anonymous interaction slightly outperforms a more complicated local technique based on ephemeral identification.
Learning and Interacting in Human-Robot Domains
- IEEE Transactions on Systems, Man, and Cybernetics, Part A
, 2001
"... Human-agent interaction is a growing area of research; there are many approaches that address significantly different aspects of agent social intelligence. In this paper, we focus on a robotic domain in which a human acts both as a teacher and a collaborator to a mobile robot. First, we present an a ..."
Abstract
-
Cited by 53 (6 self)
- Add to MetaCart
Human-agent interaction is a growing area of research; there are many approaches that address significantly different aspects of agent social intelligence. In this paper, we focus on a robotic domain in which a human acts both as a teacher and a collaborator to a mobile robot. First, we present an approach that allows a robot to learn task representations from its own experiences of interacting with a human. While most approaches to learning from demonstration have focused on acquiring policies (i.e., collections of reactive rules), we demonstrate a mechanism that constructs high-level task representations based on the robot's underlying capabilities. Second, we describe a generalization of the framework to allow a robot to interact with humans in order to handle unexpected situations that can occur in its task execution. Without using explicit communication, the robot is able to engage a human to aid it during certain parts of task execution. We demonstrate our concepts with a mobile robot learning various tasks from a human, and, when needed, interacting with a human to get help performing them.
Between MDPs and semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales
- Journal of Artificial Intelligence Research
, 1998
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion o ..."
Abstract
-
Cited by 51 (7 self)
- Add to MetaCart
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action to include options—whole courses of behavior that may be temporally extended, stochastic, and contingent on events. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Options may be given a priori, learned by experience, or both. They may be used interchangeably with actions in a variety of planning and learning methods. The theory of semi-Markov decision processes (SMDPs) can be applied to model the consequences of options and as a basis for planning and learning methods using them. In this paper we develop these connections, building on prior work by Bradtke and Duff (1995), Parr (in prep.) and others. Our main novel results concern the interface between the MDP and SMDP levels of analysis. We show how a set of options can be altered by changing only their termination conditions

