Results 11 - 20
of
37
Evolutionary Learning of General Fuzzy Rules With Biased Evaluation Functions: Competition and Cooperation
"... Fuzzy rules cooperate in a Fuzzy Logic Controller (FLC) to produce the best action for a given situation. If we have a population of fuzzy rules controlling a device, and we would like to evolve the population to obtain optimal performance by Reinforcement Learning, rules should compete each other, ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Fuzzy rules cooperate in a Fuzzy Logic Controller (FLC) to produce the best action for a given situation. If we have a population of fuzzy rules controlling a device, and we would like to evolve the population to obtain optimal performance by Reinforcement Learning, rules should compete each other, since we would like to judge their proposals. Therefore, in this approach, cooperation and competition are two opposite, desired activities done by the population members. This may be a problem, if we consider that the evaluation function may be biased, as it may happen, for instance, when we are designing a controlled device such as an Autonomous Agent. The problem becomes even harder if we would like to learn general rules, i.e., rules containing don't care symbols in their antecedents, thus competing with many groups of other rules, in many different situations. In the paper we discuss these issues, and we present our solution, implemented in ELF (Evolutionary Learning of Fuzzy rules). We...
Physically Embedded Genetic Algorithm Learning in Multi-Robot Scenarios: The PEGA algorithm
"... We present experiments in which a group of autonomous mobile robots learn to perform fundamental sensor-motor tasks through a collaborative learning process. Behavioural strategies, i.e. motor responses to sensory stimuli, are encoded by means of genetic strings stored on the individual rob ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We present experiments in which a group of autonomous mobile robots learn to perform fundamental sensor-motor tasks through a collaborative learning process. Behavioural strategies, i.e. motor responses to sensory stimuli, are encoded by means of genetic strings stored on the individual robots, and adapted through a genetic algorithm (Mitchell, 1998) executed by the entire robot collective: robots communicate their own strings and corresponding fitness to each other, and then execute a genetic algorithm to improve their individual behavioural strategy.
Combining Manual Feedback with Subsequent MDP Reward Signals for Reinforcement Learning
"... As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by hu ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that shaping can greatly reduce the sample complexity required to learn a good policy, can enable lay users to teach agents the behaviors they desire, and can allow agents to learn within a Markov Decision Process (MDP) in the absence of a coded reward function. However, tamer does not allow this human training to be combined with autonomous learning based on such a coded reward function. This paper
Robot Shaping - Principles, Methods and Architectures
- University of Sussex, UK
, 1996
"... In this paper, we contrast two seemingly opposing views on robot design: traditional engineering methods, and automated methods using learning and evolutionary techniques. We argue that while each has its advantages, it is likely that significant progress in robotics could be made using a suitable h ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
In this paper, we contrast two seemingly opposing views on robot design: traditional engineering methods, and automated methods using learning and evolutionary techniques. We argue that while each has its advantages, it is likely that significant progress in robotics could be made using a suitable hybrid of the two philosophies. Of course many successful systems already do this, but they do so in rather ad hoc ways. In contrast, we are attempting to propose a principled way in which engineering design can be combined with evolution, which we call shaping. 1 We present some general principles that we believe should underlie shaping, and follow this up with a set of methods that might be used to put those principles into practice. We then discuss and justify a novel neuro-evolutionary architecture that we believe to be particularly suitable for use in a shaping context. Finally we set out our goals for the future of this research. 1 Introduction --- Engineering vs. Evolution In the be...
Reinforcement Learning and Animat Emotions
- In
, 1996
"... Emotional states, such as happiness or sadness, pose particular problems for information processing theories of mind. Hedonic components of states, unlike cognitive components, lack representational content. Research within Artificial Life, in particular the investigation of adaptive agent architect ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Emotional states, such as happiness or sadness, pose particular problems for information processing theories of mind. Hedonic components of states, unlike cognitive components, lack representational content. Research within Artificial Life, in particular the investigation of adaptive agent architectures, provides insights into the dynamic relationship between motivation, the ability of control sub-states to gain access to limited processing resources, and prototype emotional states. Holland's learning classifier system provides a concrete example of this relationship, demonstrating simple `emotion-like' states, much as a thermostat demonstrates simple `belief-like' and `desire-like' states. This leads to the conclusion that valency, a particular form of pleasure or displeasure, is a self-monitored process of credit-assignment. The importance of the movement of a domain-independent representation of utility within adaptive architectures is stressed. Existing information processing theor...
Reinforcement learning and shaping: Encouraging intended behaviors
- Proceedings of the Nineteenth International Conference on Machine Learning
, 2002
"... We explore dynamic shaping to integrate our prior beliefs of the final policy into a conventional reinforcement learning system. Shaping provides a positive or negative artificial increment to the native task rewards in order to encourage or discourage behaviors. Previously, shaping functions have b ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We explore dynamic shaping to integrate our prior beliefs of the final policy into a conventional reinforcement learning system. Shaping provides a positive or negative artificial increment to the native task rewards in order to encourage or discourage behaviors. Previously, shaping functions have been static: the additional rewards do not vary with experience. But some prior knowledge cannot be expressed as static shaping. We take an explanation-based approach in which the specific shaping function emerges from initial experiences with the world. We compare no shaping, static shaping, and dynamic shaping in the task of learning bipedal-walking on a simulator. We empirically evaluate the convergence rate and final performance among these conditions while varying the accuracy of the prior knowledge. We conclude that in the appropriate context, dynamic shaping can greatly improve the learning of action policies. 1.
The influence of reward on the speed of reinforcement learning: An analysis of shaping
- IN PROC. 20TH INT’L CONF. ON MACHINE LEARNING
, 2003
"... Shaping can be an effective method for improving the learning rate in reinforcement systems. Previously, shaping has been heuristically motivated and implemented. We provide a formal structure with which to interpret the improvement afforded by shaping rewards. Central to our model is the idea of a ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Shaping can be an effective method for improving the learning rate in reinforcement systems. Previously, shaping has been heuristically motivated and implemented. We provide a formal structure with which to interpret the improvement afforded by shaping rewards. Central to our model is the idea of a reward horizon, which focuses exploration on an MDP's critical region, a subset of states with the property that any policy that performs well on the critical region also performs well on the MDP. We provide a simple algorithm and prove that its learning time is polynomial in the size of the critical region and, crucially, independent of the size of the MDP. This identifies low reward horizons with easy-to-learn MDPs. Shaping rewards, which encode our prior knowledge about the relative merits of decisions, can be seen as artificially reducing the MDP's natural reward horizon. We demonstrate empirically the effects of using shaping to reduce the reward horizon.
Some Methodological Issues About Designing Autonomous Agents Which Learn Their Behaviors: The Elf Experience
- In R. Trappl (Ed.) Cybernetics and Systems Research '94,World Scientific Publishing, Singapore
, 1994
"... In this paper, we present our experience in developing behavior-based autonomous agents. We focus on methodological aspects of what we call Behavior Engineering, a new branch of engineering whose aim is to design and implement behavior-based agents. Once defined a behavior, we have to decide whether ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
In this paper, we present our experience in developing behavior-based autonomous agents. We focus on methodological aspects of what we call Behavior Engineering, a new branch of engineering whose aim is to design and implement behavior-based agents. Once defined a behavior, we have to decide whether to implement it by programming the agent, or to make the agent learning its behavior. In this paper, we focus on this second aspect, and we report some results we obtained with a new machine learning algorithm (ELF - Evolutionary Learning of Fuzzy Systems) learning behaviors described as sets of fuzzy rules. The characteristic features of ELF are that it can learn the only rules it needs to achieve its task, it is resistent to biased evaluation functions, and it can learn generalizazions. 1. INTRODUCTION The design of Behavior-Based Autonomous Agents (BBAA) (Brooks, 1991)(Arkin, 1992)(Connell and Mahadevan, 1993) (Saffiotti and Ruspini, 1993) (Dorigo, 1993), (Colombetti, Dorigo, 1992) is s...
Behavior chaining: incremental behavioral integration for evolutionary robotics
- in Artificial Life XI: Proceedings of the Eleventh International Conference on the Simulation and Synthesis of Living Systems
, 2008
"... One of the open problems in autonomous robotics is how to consistently and scalably integrate new behaviors into a robot with an existing behavioral repertoire. In this work a new technique called behavior chaining is introduced, which allows for gradually expanding the behavioral repertoire of a dy ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
One of the open problems in autonomous robotics is how to consistently and scalably integrate new behaviors into a robot with an existing behavioral repertoire. In this work a new technique called behavior chaining is introduced, which allows for gradually expanding the behavioral repertoire of a dynamically behaving robot. The approach relies heavily on scaffolding: gradually restructuring the robot’s environment such that selection pressure favors the incorporation of a new behavior. This method teaches a robot a compound behavior not yet reported in the literature: dynamic legged locomotion toward an object followed by grasping, lifting and holding of that object in a physically-realistic three-dimensional environment. The method assumes that success is dependent on the order in which behaviors are learned. This is justified by results which show that if a robot is forced to learn lifting first and then incorporate locomotion, it eventually succeeds at both more often than a robot forced to learn locomotion first and then lifting.

