Results 1  10
of
54
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract

Cited by 5500 (118 self)
 Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve longterm goals.
Reinforcement Learning Methods for ContinuousTime Markov Decision Problems
 Advances in Neural Information Processing Systems
, 1994
"... SemiMarkov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic ..."
Abstract

Cited by 134 (0 self)
 Add to MetaCart
(Show Context)
SemiMarkov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Among these are TD(), Qlearning, and Realtime Dynamic Programming. After reviewing semiMarkov Decision Problems and Bellman's optimality equation in that context, we propose algorithms similar to those named above, adapted to the solution of semiMarkov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal control for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully applied. 1 Introduction A number of reinforcement learning algorithms based on the ideas of asynchronous dynamic programming and stochastic approximation have been developed recently for...
An analysis of stochastic shortest path problems
 Mathematics of Operations Research
, 1991
"... by ..."
(Show Context)
Computable Markovperfect industry dynamics, Working paper
, 2008
"... We provide a general model of dynamic competition in an oligopolistic industry with investment, entry, and exit. To ensure that there exists a computationally tractable Markov perfect equilibrium, we introduce firm heterogeneity in the form of randomly drawn, privately known scrap values and setup c ..."
Abstract

Cited by 102 (12 self)
 Add to MetaCart
We provide a general model of dynamic competition in an oligopolistic industry with investment, entry, and exit. To ensure that there exists a computationally tractable Markov perfect equilibrium, we introduce firm heterogeneity in the form of randomly drawn, privately known scrap values and setup costs into the model. Our game of incomplete information always has an equilibrium in cutoff entry/exit strategies. In contrast, the existence of an equilibrium in the Ericson & Pakes (1995) model of industry dynamics requires admissibility of mixed entry/exit strategies, contrary to the assertion in their paper, that existing algorithms cannot cope with. In addition, we provide a condition on the model’s primitives that ensures that the equilibrium is in pure investment strategies. Building on this basic existence result, we first show that a symmetric equilibrium exists under appropriate assumptions on the model’s primitives. Second, we show that, as the distribution of the random scrap values/setup costs becomes degenerate, equilibria in cutoff entry/exit strategies converge to equilibria in mixed entry/exit strategies of the game of complete information. 1
Foundations of Markovperfect industry dynamics: Existence, purification, and multiplicity, Working paper
, 2003
"... In this paper we show that existence of a Markov perfect equilibrium (MPE) in the Ericson & Pakes (1995) model of dynamic competition in an oligopolistic industry with investment, entry, and exit requires admissibility of mixed entry/exit strategies, contrary to Ericson & Pakes’s (1995) ass ..."
Abstract

Cited by 35 (6 self)
 Add to MetaCart
(Show Context)
In this paper we show that existence of a Markov perfect equilibrium (MPE) in the Ericson & Pakes (1995) model of dynamic competition in an oligopolistic industry with investment, entry, and exit requires admissibility of mixed entry/exit strategies, contrary to Ericson & Pakes’s (1995) assertion. This is problematic because the existing algorithms cannot cope with mixed strategies. To establish a firm basis for computing dynamic industry equilibria, we introduce firm heterogeneity in the form of randomly drawn, privately known scrap values and setup costs into the model. We show that the resulting game of incomplete information always has a MPE in cutoff entry/exit strategies and is computationally no more demanding than the original game of complete information. Building on our basic existence result, we first show that a symmetric and anonymous MPE exists under appropriate assumptions on the model’s primitives. Second, we show that, as the distribution of the random scrap values/setup costs becomes degenerate, MPEs in cutoff entry/exit strategies converge to MPEs in mixed entry/exit strategies of the game of complete information. Next, we provide a condition on the model’s primitives that ensures the existence of a MPE in pure investment strategies. Finally, we provide the first example of multiple symmetric and anonymous MPEs in this literature. 1
Multicriteria Reinforcement Learning
, 1998
"... We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology int ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
We consider multicriteria sequential decision making problems where the vectorvalued evaluations are compared by a given, fixed total ordering. Conditions for the optimality of stationary policies and the Bellman optimality equation are given. The analysis requires special care as the topology introduced by pointwise convergence and the ordertopology introduced by the preference order are in general incompatible. Reinforcement learning algorithms are proposed and analyzed. Preliminary computer experiments confirm the validity of the derived algorithms. It is observed that in the mediumterm multicriteria RL often converges to better solutions (measured by the first criterion) than their singlecriterion counterparts. These type of multicriteria problems are most useful when there are several optimal solutions to a problem and one wants to choose the one among these which is optimal according to another fixed criterion. Example applications include alternating games, when in addition...
Incremental Dynamic Programming for OnLine Adaptive Optimal Control
, 1994
"... Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended for use in situations where the information or computational resources needed by traditional dynamic programming algorithms are not available. IDP algorithms attempt to find a global solution to a DP problem by incrementally improving local constraint satisfaction properties as experience is gained through interaction with the environment. This class of algorithms is not new, going back at least as far as Samuel's adaptive checkersplaying programs,...