Results 1 -
8 of
8
Learning from observation using primitives
- In IEEE International Conference on Robotics and Automation
, 2001
"... This paper describes the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collect ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
This paper describes the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collected while a human performs a task is parsed into small parts of the task called primitives. Modules are created for each primitive that encode the movements required during the performance of the primitive, and when and where the primitives are performed. The feasibility of this method is currently being tested with agents that learn to play a virtual and an actual air hockey game. 1
On Learning by Exchanging Advice
, 2003
"... One of the main questions concerning learning in Multi-Agent Systems is: "(How) can agents benefit from mutual interaction during the learning process?". This paper describes the study of an interactive advice-exchange mechanism as a possible way to improve agents' learning performance. The advic ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
One of the main questions concerning learning in Multi-Agent Systems is: "(How) can agents benefit from mutual interaction during the learning process?". This paper describes the study of an interactive advice-exchange mechanism as a possible way to improve agents' learning performance. The advice-exchange technique, discussed here, uses supervised learning (backpropagation), where reinforcement is not directly coming from the environment but is based on advice given by peers with better performance score (higher confidence), to enhance the performance of a heterogeneous group of Learning Agents (LAs). The LAs are facing similar problems, in an environment where only reinforcement information is available. Each LA applies a different, well known, learning technique: Random Walk, Simulated Annealing, Evolutionary Algorithms and Q-Learning. The problem used for evaluation is a simplified traffic-control simulation. In the following text the reader can find a description of the traffic simulation and Learning Agents (focused on the advice-exchange mechanism), a discussion of the first results obtained and suggested techniques to overcome the problems that have been observed. Initial results indicate that advice-exchange can improve learning speed, although "bad advice" and/or blind reliance can disturb the learning performance. The use of supervised learning to incorporate advice given from non-expert peers using different learning algorithms, in problems where no supervision information is available, is, to the best of the authors' knowledge, a new concept in the area of Multi-Agent Systems Learning.
A Bayesian Approach to Imitation in Reinforcement Learning
- In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence
, 2003
"... In multiagent environments, forms of social learning such as teaching and imitation have been shown to aid the transfer of knowledge from experts to learners in reinforcement learning (RL). We recast the problem of imitation in a Bayesian framework. ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
In multiagent environments, forms of social learning such as teaching and imitation have been shown to aid the transfer of knowledge from experts to learners in reinforcement learning (RL). We recast the problem of imitation in a Bayesian framework.
Learning How to Do Things with Imitation
, 2000
"... In this paper we discuss how agents can learn to do things b imitating other agents. Especially we look at how the use o different metrics and sub-goal granularity can affect the imitation results. We use a computer model of a chess world as a test-bed to also illustrate issues that arise when there ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this paper we discuss how agents can learn to do things b imitating other agents. Especially we look at how the use o different metrics and sub-goal granularity can affect the imitation results. We use a computer model of a chess world as a test-bed to also illustrate issues that arise when there is dissimilar embodiment between the demonstrator and the imitator agents.
Probabilistic Policy Reuse for Inter-Task Transfer Learning
"... Policy Reuse is a reinforcement learning technique that efficiently learns a new policy by using past similar learned policies. The Policy Reuse learner improves its exploration by probabilistically including the exploitation of those past policies. Policy Reuse was introduced and previously demonst ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Policy Reuse is a reinforcement learning technique that efficiently learns a new policy by using past similar learned policies. The Policy Reuse learner improves its exploration by probabilistically including the exploitation of those past policies. Policy Reuse was introduced and previously demonstrated its effectiveness in problems with different reward functions in the same state and action spaces. In this article, we contribute Policy Reuse as transfer learning among different domains. We introduce extended MDPs to include domains and tasks, where domains have different state and action spaces, and task are problems with different rewards within a domain. We show how Policy Reuse can be applied among domains by defining and using a mapping between their state and action spaces. We use several domains, as versions of a simulated RoboCup Keepaway problem, where we show that Policy Reuse can be used as a mechanism of transfer learning significantly outperforming a basic policy learner.
Rule Fusion for the Imitation of a Human Tutor
"... Abstract — In virtual worlds, character credibility suffers from an increasing discrepancy between visual realism, physical modelling quality and behaviour simulation weakness. As behaviour credibility is firmly embedded in the eye of the human observer, it needs to be as close to human expectation ..."
Abstract
- Add to MetaCart
Abstract — In virtual worlds, character credibility suffers from an increasing discrepancy between visual realism, physical modelling quality and behaviour simulation weakness. As behaviour credibility is firmly embedded in the eye of the human observer, it needs to be as close to human expectation as possible. In this study, we define a learning process able to build rule-based behaviour from the observation of a human tutor controlling a virtual agent and from a progressive fusion of rules. The ability of this imitation process to model humancontrolled behaviour is assessed upon experiments carried out on a flee-attack scenario for an RTS game. Its efficiency is examined in a game development context. I.
Reinforcement Learning Without Rewards
, 2010
"... Machine learning can be broadly defined as the study and design of algorithms thatimprovewithexperience. Reinforcement learning isavarietyofmachinelearning that makes minimal assumptions about the information available for learning, and, in a sense, defines the problem of learning in the broadest po ..."
Abstract
- Add to MetaCart
Machine learning can be broadly defined as the study and design of algorithms thatimprovewithexperience. Reinforcement learning isavarietyofmachinelearning that makes minimal assumptions about the information available for learning, and, in a sense, defines the problem of learning in the broadest possible terms. Reinforcement learning algorithms are usually applied to “interactive” problems, such as learning to drive a car, operate a robotic arm, or play a game. In reinforcement learning, an autonomous agent must learn how to behave in an unknown, uncertain, and possibly hostile environment, usingonly thesensory feedbackthat it receives from theenvironment. As the agent moves from one state of the environment to another, it receives only a reward signal — there is no human “in the loop ” to tell the algorithm exactly what to do. The goal in reinforcement learning is to learn an optimal behavior that maximizes the total reward that the agent collects. Despite its generality, the reinforcement learning framework does make one strong assumption: that the reward signal can always be directly and unambiguously observed. In other words, the feedback a reinforcement learning algorithm receives is

