Results 1 - 10
of
68
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract
-
Cited by 2829 (76 self)
- Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve long-term goals.
Predicting How People Play Games: Reinforcement Learning . . .
- AMERICAN ECONOMIC REVIEW
, 1998
"... ..."
Learning and Sequential Decision Making
- LEARNING AND COMPUTATIONAL NEUROSCIENCE
, 1989
"... In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of ..."
Abstract
-
Cited by 185 (10 self)
- Add to MetaCart
In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of animal learning in classical conditioning experiments. Here we relate TD methods to decision tasks formulated in terms of a stochastic dynamical system whose behavior unfolds over time under the influence of a decision maker's actions. Strategies are sought for selecting actions so as to maximize a measure of long-term payoff gain. Mathematically, tasks such as this can be formulated as Markovian decision problems, and numerous methods have been proposed for learning how to solve such problems. We show how a TD method can be understood as a novel synthesis of concepts from the theory of stochastic dynamic programming, which comprises the standard method for solving such tasks when a model of the dynamical system is available, and the theory of parameter estimation, which provides the appropriate context for studying learning rules in the form of equations for updating associative strengths in behavioral models, or connection weights in connectionist networks. Because this report is oriented primarily toward the non-engineer interested in animal learning, it presents tutorials on stochastic sequential decision tasks, stochastic dynamic programming, and parameter estimation.
Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term
- GAMES AND ECONOMIC BEHAVIOR 8, 164--212 (1995)
, 1995
"... We use simple learning models to track the behavior observed in experiments concerning three extensive form games with similar perfect equilibria. In only two of the games does observed behavior approach the perfect equilibrium as players gain experience. We examine a family of learning models which ..."
Abstract
-
Cited by 163 (9 self)
- Add to MetaCart
We use simple learning models to track the behavior observed in experiments concerning three extensive form games with similar perfect equilibria. In only two of the games does observed behavior approach the perfect equilibrium as players gain experience. We examine a family of learning models which possess some of the robust properties of learning noted in the psychology literature. The intermediate term predictions of these models track well the observed behavior in all three games, even though the models considered differ in their very long term predictions. We argue that for predicting observed behavior the intermediate term predictions of dynamic learning models may be even more important than their asymptotic properties.
Experience-weighted Attraction Learning in Normal Form Games
- Econometrica
, 1999
"... We describe a general model, `experience-weighted attraction' (EWA) learning, which includes reinforcement learning and a class of weighted fictitious play belief models as special cases. In EWA, strategies have attractions which reflect prior predispositions, are updated based on payoff experience, ..."
Abstract
-
Cited by 99 (13 self)
- Add to MetaCart
We describe a general model, `experience-weighted attraction' (EWA) learning, which includes reinforcement learning and a class of weighted fictitious play belief models as special cases. In EWA, strategies have attractions which reflect prior predispositions, are updated based on payoff experience, and determine choice probabilities according to some rule (e.g., logit). A key feature is a parameter delta which weights the strength of hypothetical reinforcement of strategies which were not chosen according to the payoff they would have yielded. When delta = 0 choice reinforcement results. When delta = 1, levels of reinforcement of strategies are proportional to expected payoffs given beliefs based on past history. Another key feature is the growth rates of attractions. The EWA model controls the growth rates by two decay parameters, phi and rho, which depreciate attractions and amount of experience separately. When phi = rho, belief-based models result; when rho = 0 choice reinforcement results. Using three data se...
End Behavior in Sequences of Finite Prisoner’s Dilemma Supergames
, 1986
"... A learning theory is proposed which models the influence of experience on end behavior in finite Prisoner's Dilemma supergames. The theory is compared with experimental results. In the experiment 35 subjects participated in 25 Prisoner's Dilemma supergames of ten periods each against anonymous oppon ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
A learning theory is proposed which models the influence of experience on end behavior in finite Prisoner's Dilemma supergames. The theory is compared with experimental results. In the experiment 35 subjects participated in 25 Prisoner's Dilemma supergames of ten periods each against anonymous opponents, changing from supergame to supergame. The typical behavior of experienced subjects involves cooperation until shortly before the end of the supergame. The theory explains shifts in the intended deviation period. On the basis of parameter estimates for each subject derived from the fn'st 20 supergames, successful predictions could be obtained for the last five supergames. 1.
Learning to Solve Markovian Decision Processes
, 1994
"... This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have d ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have developed reinforcement learning (RL) algorithms based on dynamic programming (DP) that use the agent's experience in its environment to improve its decision policy incrementally. This is achieved by adapting an evaluation function in such a way that the decision policy that is "greedy" with respect to it improves with experience. This dissertation focuses on finite, stationary and Markovian environments for two reasons: it allows the develop...
Boundedly Rational Rule Learning in a Guessing Game
- GAMES AND ECONOMIC BEHAVIOR 16, 303–330 (1996)
, 1996
"... We combine Nagel’s “step-k” model of boundedly rational players with a “law of effect” learning model. Players begin with a disposition to use one of the step-k rules of behavior, and over time the players learn how the available rules perform and switch to better performing rules. We offer an econo ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
We combine Nagel’s “step-k” model of boundedly rational players with a “law of effect” learning model. Players begin with a disposition to use one of the step-k rules of behavior, and over time the players learn how the available rules perform and switch to better performing rules. We offer an econometric specification of this dynamic process and fit it to Nagel’s experimental data. We find that the rule of learning model vastly outperforms other nested and nonnested learning models. We find strong evidence for diverse dispositions and reject the Bayesian rule-learning model.
Evolving market structure: an ACE model of price dispersion and loyalty
- Journal of Economic Dynamics and Control
"... We present an agent-based computational economics (ACE) model of the wholesale "sh market in Marseille. Two of the stylized facts of that market are high loyalty of buyers to sellers, and persistent price dispersion, although it is every day the same population of sellers and buyers that meets in th ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
We present an agent-based computational economics (ACE) model of the wholesale "sh market in Marseille. Two of the stylized facts of that market are high loyalty of buyers to sellers, and persistent price dispersion, although it is every day the same population of sellers and buyers that meets in the same market hall. In our ACE model, sellers decide on quantities to supply, prices to ask, and how to treat loyal customers, while buyers decide which sellers to visit, and which prices to accept. Learning takes place through reinforcement. The model explains both stylized facts price dispersion and high loyalty. In a coevolutionary process, buyers learn to become loyal as sellers learn to o!er higher utility to loyal buyers, while these sellers, in turn, learn to o!er higher utility to loyal buyers as they happen to realize higher gross revenues from loyal buyers. The model also explains the e!ect of heterogeneity of the buyers. We analyze how this leads to subtle di!erences in the shopping patterns of the di!erent types of buyers, and how this is

