Results 1 - 10
of
25
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract
-
Cited by 2827 (76 self)
- Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve long-term goals.
The Artificial Life Roots of Artificial Intelligence
, 1993
"... Behavior-oriented AI is a scientific discipline that studies how behavior of agents emerges and becomes intelligent and adaptive. Success of the field is defined in terms of success in building physical agents that are capable of maximising their own self-preservation in interaction with a dynami ..."
Abstract
-
Cited by 98 (5 self)
- Add to MetaCart
Behavior-oriented AI is a scientific discipline that studies how behavior of agents emerges and becomes intelligent and adaptive. Success of the field is defined in terms of success in building physical agents that are capable of maximising their own self-preservation in interaction with a dynamically changing environment. The paper addresses this artificial life route towards artificial intelligence and reviews some of the results obtained so far. 1 Official reference: Steels, L. (1994) The artificial life roots of artificial intelligence. Artificial Life Journal, Vol 1,1. MIT Press, Cambridge. 1 Introduction For several decades, the field of Artificial Intelligence has been pursuing the study of intelligent behavior using the methodology of the artificial [104]. But the focus of this field, and hence the successes, have mostly been on higher order cognitive activities such as expert problem solving. The inspiration for AI theories has mostly come from logic and the cognitive...
30 years of adaptive neural networks
, 1990
"... Fundamental developments in feedfonvard artificial neural net-works from the past thirty years are reviewed. The central theme of this paper is a description of the history, origination, operating characteristics, and basic theory of several supervised neural net-work training algorithms including t ..."
Abstract
-
Cited by 50 (2 self)
- Add to MetaCart
Fundamental developments in feedfonvard artificial neural net-works from the past thirty years are reviewed. The central theme of this paper is a description of the history, origination, operating characteristics, and basic theory of several supervised neural net-work training algorithms including the Perceptron rule, the LMS algorithm, three Madaline rules, and the backpropagation tech-nique. These methods were developed independently, but with the perspective of history they can a/ / be related to each other. The concept underlying these algorithms is the “minimal disturbance principle, ” which suggests that during training it is advisable to inject new information into a network in a manner that disturbs stored information to the smallest extent possible. I.
Adaptive Critic Designs
- IEEE Transactions on Neural Networks
, 1997
"... We discuss a variety of Adaptive Critic Designs (ACDs) for neurocontrol. These are suitable for learning in noisy, nonlinear, and nonstationary environments. They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Our discussion of these origins ..."
Abstract
-
Cited by 44 (6 self)
- Add to MetaCart
We discuss a variety of Adaptive Critic Designs (ACDs) for neurocontrol. These are suitable for learning in noisy, nonlinear, and nonstationary environments. They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Our discussion of these origins leads to an explanation of three design families: Heuristic Dynamic Programming (HDP), Dual Heuristic Programming (DHP), and Globalized Dual Heuristic Programming (GDHP). The main emphasis is on DHP and GDHP as advanced ACDs. We suggest two new modifications of the original GDHP design that are currently the only working implementations of GDHP. They promise to be useful for many engineering applications in the areas of optimization and optimal control. Based on one of these modifications, we present a unified approach to all ACDs. This leads to a generalized training procedure for ACDs. 1 The authors gratefully acknowledge support from the Texas Tech Center for Applied Research, Ford Moto...
Associative search network: a reinforcement learning associative memory
- Biological Cybernetics
, 1981
"... Abstract. An associative memory system is presented which does not require a "teacher " to provide the desired associations. For each input key it conducts a search for the output pattern which optimizes an external payoff or reinforcement signal. The associative search network (ASN) combines patter ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
Abstract. An associative memory system is presented which does not require a "teacher " to provide the desired associations. For each input key it conducts a search for the output pattern which optimizes an external payoff or reinforcement signal. The associative search network (ASN) combines pattern recognition and function optimization capabilities in a simple and effective way. We define the associative search problem, discuss conditions under which the associative search network is capable of solving it, and present results from computer simulations. The synthesis of sensory-motor control surfaces is discussed as an example of the associative search problem. Numerous reports have appeared in the literature describing associative memory systems in which information is distributed across large areas of the physical memory structure (e.g., Amari, 1977; Anderson et al.,
Automated Learning of Load-Balancing Strategies For A Distributed Computer System
, 1992
"... (or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDER-SIDE RULES (s) Possible-destinations = { site: Load(site) - Reference(s) < d(s) } Destination = ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
(or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDER-SIDE RULES (s) Possible-destinations = { site: Load(site) - Reference(s) < d(s) } Destination = Random(Possible-destinations) IF Load(s) - Reference(s) > q 1 (s) THEN Send RECEIVER-SIDE RULES (r) IF Load(r) < q 2 (r) THEN Receive Figure 3. The load-balancing policy considered in this thesis The sender-side rules are applied by the load-balancing software at the site of arrival (s) of a task. Reference can be either 0 or MinLoad; the other parameters --- d, q 1 , and q 2 --- take non-negative floating-point values. A remote destination (r) is chosen randomly from Destinations, a set of sites whose load index falls within a small neighborhood of Reference. If Destinations is the empty set, or if the rule for sending fails, then the task is executed locally at s, its site of arrival; ot...
An Investigation of Feedforward Neural Networks with Respect to the Detection of Spurious Patterns
, 1995
"... This thesis investigates feedforward neural networks in the context of classification tasks with respect to the detection of patterns that do not belong to the same categories of patterns used to train the network. This refers to the problem of the detection and/or rejection of spurious or novel pat ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This thesis investigates feedforward neural networks in the context of classification tasks with respect to the detection of patterns that do not belong to the same categories of patterns used to train the network. This refers to the problem of the detection and/or rejection of spurious or novel patterns. In particular, the multilayer perceptron network (MLP) trained with the backpropagation algorithm is examined in this respect and different strategies for improving its performance in the detection of spurious patterns are considered. The problem is investigated from different points of view that vary from the modification of the multilayer perceptron network with different configurations that make it more intrinsically able to detect spurious information, to the introduction of novel auxiliary mechanisms which, when integrated with the MLP network, can provide an overall enhancement in the system's rejection capabilities. These different network configurations are examined with respe...
Adaptive Critic Based Approximate Dynamic Programming for Tuning Fuzzy Controllers
- in Proceedings of lEEE-FUZZ 2000, IEEE. [141 Werbos, P
, 2000
"... Abstract: In this paper we show the applicability of the Dual Heuristic Programming (DHP) method of Approximate Dynamic Programming to parameter tuning of a fuzzy control system. DHP and related techniques have been developed in the neurocontrol context but can be equally productive when used with f ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract: In this paper we show the applicability of the Dual Heuristic Programming (DHP) method of Approximate Dynamic Programming to parameter tuning of a fuzzy control system. DHP and related techniques have been developed in the neurocontrol context but can be equally productive when used with fuzzy controllers or neuro-fuzzy hybrids. We demonstrate this technique on a highly nonlinear 2 nd order plant proposed by Sanner and Slotine. Throughout our example application, we take advantage of the TS model framework to initialize our tunable parameters with reasonable problem specific values, a practice difficult to perform when applying DHP to neurocontrol. I.
Reinforcement Learning Architectures
- Proceedings ISKIT'92 International Symposium on Neural Information Processing
, 1992
"... Reinforcement learning is the learning of a mapping from situations to actions so as to maximize a scalar reward or reinforcement signal. The learner is not told which action to take, as in most forms of learning, but instead must discover which actions yield the highest reward by trying them. In th ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Reinforcement learning is the learning of a mapping from situations to actions so as to maximize a scalar reward or reinforcement signal. The learner is not told which action to take, as in most forms of learning, but instead must discover which actions yield the highest reward by trying them. In the most interesting and challenging cases, actions affect not only the immediate reward, but also the next situation, and through that all subsequent rewards. These two characteristics -- trial-and-error search and delayed reward -- are the two most important distinguishing features of reinforcement learning. In this paper I present a brief overview of the development of reinforcement learning architectures over the past decade, including reinforcement-comparison, actor-critic, and Q-learning architectures. Finally, I present Dyna, a class of architectures based on reinforcement learning but which go beyond trial-and-error learning to include a learned internal model of the world. By ...

