Results 1  10
of
26,001
Learning to predict by the methods of temporal differences
 MACHINE LEARNING
, 1988
"... This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional predictionlearning methods assign credit by means of the difference between predi ..."
Abstract

Cited by 1521 (56 self)
 Add to MetaCart
predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporaldifference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic
Practical Issues in Temporal Difference Learning
 Machine Learning
, 1992
"... This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex realworld problems. A number of important practical issues are identified and discussed from a general theoretical perspect ..."
Abstract

Cited by 415 (2 self)
 Add to MetaCart
This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex realworld problems. A number of important practical issues are identified and discussed from a general theoretical
An analysis of temporaldifference learning with function approximation
 IEEE Transactions on Automatic Control
, 1997
"... We discuss the temporaldifference learning algorithm, as applied to approximating the costtogo function of an infinitehorizon discounted Markov chain. The algorithm weanalyze updates parameters of a linear function approximator online, duringasingle endless trajectory of an irreducible aperiodi ..."
Abstract

Cited by 313 (8 self)
 Add to MetaCart
We discuss the temporaldifference learning algorithm, as applied to approximating the costtogo function of an infinitehorizon discounted Markov chain. The algorithm weanalyze updates parameters of a linear function approximator online, duringasingle endless trajectory of an irreducible
Linear leastsquares algorithms for temporal difference learning
 Machine Learning
, 1996
"... Abstract. We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares function approximation. We define an algorithm we call LeastSquares TD (LS TD) for which we prove probabilityone convergence when it is used with a function approximator linear in the adju ..."
Abstract

Cited by 260 (1 self)
 Add to MetaCart
Abstract. We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares function approximation. We define an algorithm we call LeastSquares TD (LS TD) for which we prove probabilityone convergence when it is used with a function approximator linear
Temporaldifference networks
 In Advances in Neural Information Processing Systems 17
, 2005
"... We introduce a generalization of temporaldifference (TD) learning to networks of interrelated predictions. Rather than relating a single prediction to itself at a later time, as in conventional TD methods, a TD network relates each prediction in a set of predictions to other predictions in the set ..."
Abstract

Cited by 44 (8 self)
 Add to MetaCart
We introduce a generalization of temporaldifference (TD) learning to networks of interrelated predictions. Rather than relating a single prediction to itself at a later time, as in conventional TD methods, a TD network relates each prediction in a set of predictions to other predictions in the set
Unsupervised learning of human action categories using spatialtemporal words
 In Proc. BMVC
, 2006
"... Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences ..."
Abstract

Cited by 494 (8 self)
 Add to MetaCart
Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences
Dual Temporal Difference Learning
"... Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning al ..."
Abstract
 Add to MetaCart
Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning
Relational temporal difference learning
 In ICML
, 2006
"... We introduce relational temporal difference learning as an effective approach to solving multiagent Markov decision problems with large state spaces. Our algorithm uses temporal difference reinforcement to learn a distributed value function represented over a conceptual hierarchy of relational pred ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We introduce relational temporal difference learning as an effective approach to solving multiagent Markov decision problems with large state spaces. Our algorithm uses temporal difference reinforcement to learn a distributed value function represented over a conceptual hierarchy of relational
Preconditioned Temporal Difference Learning
"... This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes leastsquares temporal difference (LSTD) learning, leastsquares policy evaluation (LSPE) and a variant of incremental LSTD (iLSTD). The basis of this extension is a preconditioning tec ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes leastsquares temporal difference (LSTD) learning, leastsquares policy evaluation (LSPE) and a variant of incremental LSTD (iLSTD). The basis of this extension is a preconditioning
Kalman Temporal Differences
 Journal of Artificial Intelligence Research (JAIR
, 2010
"... Because reinforcement learning suffers from a lack of scalability, online value (and Q) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the foll ..."
Abstract

Cited by 25 (18 self)
 Add to MetaCart
Because reinforcement learning suffers from a lack of scalability, online value (and Q) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits
Results 1  10
of
26,001