Results 1 - 10
of
13
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract
-
Cited by 2827 (76 self)
- Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve long-term goals.
Steps toward artificial intelligence
- Computers and Thought
, 1961
"... Harvard University. The work toward attaining "artificial intelligence’ ’ is the center of considerable computer research, design, and application. The field is in its starting transient, characterized by many varied and independent efforts. Marvin Minsky has been requested to draw this work to ..."
Abstract
-
Cited by 144 (0 self)
- Add to MetaCart
Harvard University. The work toward attaining "artificial intelligence’ ’ is the center of considerable computer research, design, and application. The field is in its starting transient, characterized by many varied and independent efforts. Marvin Minsky has been requested to draw this work together into a coherent summary, supplement it with appropriate explanatory or theoretical noncomputer information, and introduce his assessment of the state of the art. This paper emphasizes the class of activities in which a general-purpose computer, complete with a library of basic programs, is further programmed to perform operations leading to ever higher-level information processing functions such as learning and problem solving. This informative article will be of real interest to both the general Proceedings reader and the computer specialist.-- The Guest Editor.
Learning and Problem Solving with Multilayer Connectionist Systems
, 1986
"... Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered netwo ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the issues of learning in a network's hidden units, and reviews methods for addressing these issues that have been developed through the years. Issues of learning in hidden units are shown to be analogous to learning issues for multilayer systems employing symbolic representations.
Reinforcement Learning And Its Application To Control
, 1992
"... Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be us ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be used to train the controller. But when such control actions are not known a priori, appropriate control behavior has to be inferred from observations of the IP. One can distinguish between two classes of methods for training controllers under such circumstances. Indirect methods involve constructing a model of the problem's IP and using the model to obtain training information for the controller. On the other hand, direct, or model-free,...
Learning as search optimization: Approximate large margin methods for structured prediction
- In ICML
, 2005
"... Mappings to structured output spaces (strings, trees, partitions, etc.) are typically learned using extensions of classification algorithms to simple graphical structures (eg., linear chains) in which search and parameter estimation can be performed exactly. Unfortunately, in many complex problems, ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
Mappings to structured output spaces (strings, trees, partitions, etc.) are typically learned using extensions of classification algorithms to simple graphical structures (eg., linear chains) in which search and parameter estimation can be performed exactly. Unfortunately, in many complex problems, it is rare that exact search or parameter estimation is tractable. Instead of learning exact models and searching via heuristic means, we embrace this difficulty and treat the structured output problem in terms of approximate search. We present a framework for learning as search optimization, and two parameter updates with convergence theorems and bounds. Empirical evidence shows that our integrated approach to learning and decoding can outperform exact models at smaller computational cost. 1.
Learning to Trade via Direct Reinforcement
, 2001
"... We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent r ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-Learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.
Reinforcement Learning Architectures
- Proceedings ISKIT'92 International Symposium on Neural Information Processing
, 1992
"... Reinforcement learning is the learning of a mapping from situations to actions so as to maximize a scalar reward or reinforcement signal. The learner is not told which action to take, as in most forms of learning, but instead must discover which actions yield the highest reward by trying them. In th ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Reinforcement learning is the learning of a mapping from situations to actions so as to maximize a scalar reward or reinforcement signal. The learner is not told which action to take, as in most forms of learning, but instead must discover which actions yield the highest reward by trying them. In the most interesting and challenging cases, actions affect not only the immediate reward, but also the next situation, and through that all subsequent rewards. These two characteristics -- trial-and-error search and delayed reward -- are the two most important distinguishing features of reinforcement learning. In this paper I present a brief overview of the development of reinforcement learning architectures over the past decade, including reinforcement-comparison, actor-critic, and Q-learning architectures. Finally, I present Dyna, a class of architectures based on reinforcement learning but which go beyond trial-and-error learning to include a learned internal model of the world. By ...
Anatomy of a Cortical Simulator
"... Insights into brain’s high-level computational principles will lead to novel cognitive systems, computing architectures, programming paradigms, and numerous practical applications. An important step towards this end is the study of large networks of cortical spiking neurons. We have built a cortical ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Insights into brain’s high-level computational principles will lead to novel cognitive systems, computing architectures, programming paradigms, and numerous practical applications. An important step towards this end is the study of large networks of cortical spiking neurons. We have built a cortical simulator, C2, incorporating several algorithmic enhancements to optimize the simulation scale and time, through: computationally efficient simulation of neurons in a clock-driven and synapses in an event-driven fashion; memory efficient representation of simulation state; and communication efficient message exchanges. Using phenomenological, single-compartment models of spiking neurons and synapses with spike-timing dependent plasticity, we represented a rat-scale cortical model (55 million neurons, 442 billion synapses) in 8TB memory of a 32,768processor BlueGene/L. With 1 millisecond resolution for neuronal dynamics and 1-20 milliseconds axonal delays, C2 can simulate 1 second of model time in 9 seconds per Hertz of average neuronal firing rate. In summary, by combining state-of-the-art hardware with innovative algorithms and software design, we simultaneously achieved unprecedented time-to-solution on an unprecedented problem size. 1.
Learning to Maximize Rewards:
"... Introduction" by R. S. Sutton and A. G. Barto Rajesh P.N. Rao The Salk Institute Computational Neurobiology Laboratory and Sloan Center for Theoretical Neurobiology 10010 N. Torrey Pines Road La Jolla, CA 92037, USA E-mail: rao@salk.edu Neural Networks, Vol. 13(1), pp. 135-137, 2000 Of sev ..."
Abstract
- Add to MetaCart
Introduction" by R. S. Sutton and A. G. Barto Rajesh P.N. Rao The Salk Institute Computational Neurobiology Laboratory and Sloan Center for Theoretical Neurobiology 10010 N. Torrey Pines Road La Jolla, CA 92037, USA E-mail: rao@salk.edu Neural Networks, Vol. 13(1), pp. 135-137, 2000 Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. [Thorndike, 1911] The idea of learning to make appropriate responses based on rein

