Results 1 - 10
of
25
Reinforcement learning: a survey
- Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract
-
Cited by 1134 (21 self)
- Add to MetaCart
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Classifier Fitness Based on Accuracy
, 1995
"... In many classifier systems, the classifier strength parameter serves as a predictor of future payoff and as the classifier's fitness for the genetic algorithm. We investigate a classifier system, XCS, in which each classifier maintains a prediction of expected payoff, but the classifier's fitness is ..."
Abstract
-
Cited by 239 (14 self)
- Add to MetaCart
In many classifier systems, the classifier strength parameter serves as a predictor of future payoff and as the classifier's fitness for the genetic algorithm. We investigate a classifier system, XCS, in which each classifier maintains a prediction of expected payoff, but the classifier's fitness is given by a measure of the prediction's accuracy. The system executes the genetic algorithm in niches defined by the match sets, instead of panmictically. These aspects of XCS result in its population tending to form a complete and accurate mapping X x A => P from inputs and actions to payoff predictions. Further, XCS tends to evolve classifiers that are maximally general subject to an accuracy criterion. Besides introducing a new direction for classifier system research, these properties of XCS make it suitable for a wide range of reinforcement learning situations where generalization over states is desirable. Key words Classifier systems, strength, fitness, accuracy, mapping, generalizati...
Generalization in the XCS Classifier System
, 1998
"... This paper studies two changes to XCS, a classifier system in which fitness is based on prediction accuracy and the genetic algorithm takes place in environmental niches. The changes were aimed at increasing XCS's tendency to evolve accurate, maximally general classifiers and were tested on pr ..."
Abstract
-
Cited by 62 (10 self)
- Add to MetaCart
This paper studies two changes to XCS, a classifier system in which fitness is based on prediction accuracy and the genetic algorithm takes place in environmental niches. The changes were aimed at increasing XCS's tendency to evolve accurate, maximally general classifiers and were tested on previously employed "woods" and multiplexer tasks. Together the changes bring XCS close to evolving populations whose high-fitness classifiers form a near-minimal, accurate, maximally general cover of the input and action product space. In addition, results on the multiplexer, a difficult categorization task, suggest that XCS's learning complexity is polynomial in the input length and thus may avoid the "curse of dimensionality", a notorious barrier to scale-up. A comparison between XCS and genetic programming in solving the 6multiplexer suggests that XCS's learning rate is about three orders of magnitude faster in terms of the number of input instances processed.
An Incremental Approach to Developing Intelligent Neural Network Controllers for Robots
- IEEE Transactions on Systems, Man, and Cybernetics
, 1995
"... By beginning with simple reactive behaviors and gradually building up to more memory-dependent behaviors, it may be possible for connectionist systems to eventually achieve the level of planning. This paper focuses on an intermediate step in this incremental process, where the appropriate means of p ..."
Abstract
-
Cited by 46 (7 self)
- Add to MetaCart
By beginning with simple reactive behaviors and gradually building up to more memory-dependent behaviors, it may be possible for connectionist systems to eventually achieve the level of planning. This paper focuses on an intermediate step in this incremental process, where the appropriate means of providing guidance to adapting controllers is explored. A local and a global method of reinforcement learning are contrasted---a special form of back-propagation and an evolutionary algorithm. These methods are applied to a neural network controller for a simple robot. A number of experiments are described where the presence of explicit goals and the immediacy of reinforcement are varied. These experiments reveal how various types of guidance can affect the final control behavior. The results show that the respective advantages and disadvantages of these two adaptation methods are complementary, suggesting that some hybrid of the two may be the most effective method. Concluding remarks discus...
Alecsys and the AutonoMouse: Learning to Control a Real Robot by Distributed Classifier Systems
- Machine Learning
, 1995
"... Abstract. In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency eonstraints which are addressed by three main tools: parallelism, distributed architecture, and t ..."
Abstract
-
Cited by 41 (16 self)
- Add to MetaCart
Abstract. In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency eonstraints which are addressed by three main tools: parallelism, distributed architecture, and training. Parallelismis useful to speed up computation and to increase the flexibility of the learning system design. Distributed architecture helps in making it possible to deeompose the overall task into a set of simpler learning tasks. Finally, training provides guidance to the system while learning, shortening the number of cycles required to learn. These tools and the issues they raise are first studied in simulation, and theu the experience gained with simulations is used to implement the learning system on the real robot. Results have shown that with this approach it is possible to let the AutonoMouse, a small real robot, learn to approach a light source under a number of different noise and lesion conditions. Keywords: learning classifier systems, reinforcement learning, genetic algorithms, animat problem 1.
Evolving Optimal Populations with XCS Classifier Systems
, 1996
"... This work investigates some uses of self-monitoring in classifier systems (CS) using Wilson's recent XCS system as a framework. XCS is a significant advance in classifier systems technology which shifts the basis of fitness evaluation for the Genetic Algorithm (GA) from the strength of payoff predic ..."
Abstract
-
Cited by 39 (9 self)
- Add to MetaCart
This work investigates some uses of self-monitoring in classifier systems (CS) using Wilson's recent XCS system as a framework. XCS is a significant advance in classifier systems technology which shifts the basis of fitness evaluation for the Genetic Algorithm (GA) from the strength of payoff prediction to the accuracy of payoff prediction. Initial work consisted of implementing an XCS system in Pop11 and replicating published XCS multiplexer experiments from (Wilson 1995, 1996a). In subsequent original work, the XCS Optimality Hypothesis, which suggests that under certain conditions XCS systems can reliably evolve optimal populations (solutions), is proposed. An optimal population is one which accurately maps inputs to actions to reward predictions using the smallest possible set of classifiers. An optimal XCS population forms a complete mapping of the payoff environment in the reinforcement learning tradition, in contrast to traditional classifier systems which only seek to maximise ...
Emotional Agents
, 1997
"... this document. 9.5.2 A comparison of CUE and libido ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
this document. 9.5.2 A comparison of CUE and libido
Individual Learning of Coordination Knowledge
- Journal of Experimental & Theoretical Artificial Intelligence
, 1998
"... Social agents, both human and computational, inhabiting a world containing multiple active agents, need to coordinate their activities. This is because agents share resources, and without proper coordination or "rules of the road", everybody will be interfering with the plans of others. As such, we ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Social agents, both human and computational, inhabiting a world containing multiple active agents, need to coordinate their activities. This is because agents share resources, and without proper coordination or "rules of the road", everybody will be interfering with the plans of others. As such, we need coordination schemes that allow agents to effectively achieve local goals without adversely affecting the problem-solving capabilities of other agents. Researchers in the field of Distributed Artificial Intelligence (DAI) have developed a variety of coordination schemes under different assumptions about agent capabilities and relationships. Whereas some of these research have been motivated by human cognitive biases, others have approached it as an engineering problem of designing the most effective coordination architecture or protocol. We evaluate individual and concurrent learning by multiple, autonomous agents as a means for acquiring coordination knowledge. We show that a uniform r...
Robot Shaping - Principles, Methods and Architectures
- University of Sussex, UK
, 1996
"... In this paper, we contrast two seemingly opposing views on robot design: traditional engineering methods, and automated methods using learning and evolutionary techniques. We argue that while each has its advantages, it is likely that significant progress in robotics could be made using a suitable h ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
In this paper, we contrast two seemingly opposing views on robot design: traditional engineering methods, and automated methods using learning and evolutionary techniques. We argue that while each has its advantages, it is likely that significant progress in robotics could be made using a suitable hybrid of the two philosophies. Of course many successful systems already do this, but they do so in rather ad hoc ways. In contrast, we are attempting to propose a principled way in which engineering design can be combined with evolution, which we call shaping. 1 We present some general principles that we believe should underlie shaping, and follow this up with a set of methods that might be used to put those principles into practice. We then discuss and justify a novel neuro-evolutionary architecture that we believe to be particularly suitable for use in a shaping context. Finally we set out our goals for the future of this research. 1 Introduction --- Engineering vs. Evolution In the be...
Methodological Issues for Designing Multi-Agent Systems with Machine Learning Techniques: Capitalizing Experiences from the RoboCup Challenge
, 1998
"... This paper deals with one of the probably most challenging and, in our opinion, little addressed question that can be found in Distributed Artificial Intelligence today, that of the methodological design of a learning multi-agent system (MAS). In previous work, in order to solve the current softw ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper deals with one of the probably most challenging and, in our opinion, little addressed question that can be found in Distributed Artificial Intelligence today, that of the methodological design of a learning multi-agent system (MAS). In previous work, in order to solve the current software engineering problem of having the ingredients (MAS techniques) but not the recipes (the methodology) we have defined Cassiopeia, an agent-oriented, rolebased method for the design of MAS. It relies on three important notions: (1) independence from the implementation techniques; (2) definition of an agent as a set of three different levels of roles; (3) specification of a methodological process that reconciles both the bottom-up and the top-down approaches to the problem of organization. In this paper we show how this method enables Machine Learning (ML) techniques to be clearly classified and integrated at first hand in the design process of an MAS, by carefully considering the ...

