| Gerald Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58--68, 1995. |
....calls upon another member when such an action leads to higher reinforcement. 3 Experimental Results 3.1 Experiment Setup One of research in artificial intelligence is programming a computer that can play board games. Board game do mains such as Chess [5] Check [4] GO [3] and Backgammon [17] have been popular since they have finite state spaces with well defined rules. Since it is usually impossible to search exhaustively the state space, artificial intelligence research in game domains has primarily worked on solutions that can play a game comparable to or better than a human ....
....trained on backgammon use an expanded scheme to encode the local information. For a player s checkers, a truncated unary encoding with five units is used to encode each checker s position (124, on the bar and off the board) For encoding opponent s information, TD Gammon s encoding scheme [17] is used. For each checker s position, a truncated unary encoding with four units is used. The first three units are encoded three cases: one checker, two checkers and three checkers, while the fourth unit encodes the number of checkers beyond 3. A total of 96 units is used to encode the ....
[Article contains additional citation context not shown here]
G. Tesauro. Temporal difference learning and td-gammon. Communications of ACM, 38(3):58--67, 1995.
....Two different approaches are typically taken. In the first, the value function is represented using, for example, a backpropagation 118 network. The neural network is then updated using Q learning or value iteration. This approach has been used very successfully for learning to play Backgammon [109] among other applications. Other approximations, such as the use of linear function approximators [111] have also been proposed. More recently, approaches that approximate the value function based on states of known (or estimated) value close to the current state, such as locally weighted ....
Gerald Tesauro. Temporal difference learning and TD-Gammon. Communi- cations of the ACM, pages 58 67, March 1995.
....a mine(m) a safe move cannot be done. Given the difficulty of handcrafting playing strategies for this and other games, AI researchers have always been interested in the possibility of automatically learning such strategies from experience. However, with the exception of reinforcement learning [Tesauro, 1995] , most of the playing strategies and heuristics used in game playing programs are coded and tuned per hand instead of automatically learned. In this work, we use a general purpose ILP system, Mio, to learn a playing strategy for Minesweeper. Multirelational learning or ILP consists in learning ....
Gerald Tesauro. Temporal--difference learning and td--gammon. Communications of the ACM, 38(3):58--68, 1995.
....games with enumerable state and action spaces. Historically, though, a number of landmark results in reinforcement learning have looked at learning in particular stochastic games that are not small nor are the state easily enumerated. Samuel s Checkers playing program [15] and Tesauro s TD Gammon [21] are successful applications of learning in games with very large state spaces. Both of these results made generous use of generalization and approximation, which have not been used in the more recent work. On the other hand, both TD Gammon and Samuel s Checkers player only used deterministic ....
Gerald J. Tesauro. Temporal difference learning and TD--Gammon. Communications of the ACM, 38:48--68, 1995.
....known as combinatorial games [9] These games are classified as two player games, with no hidden information, no chance moves, a restricted outcome (win, lose and draw) and with each player moving alternately. This is different to games such as poker [10] 11] 12] backgammon [13] 14] [15], 16] or bridge [17] 18] 19] 20] where there is hidden information, a chance element and, possibly, more than two players. A recent survey of computers and game playing [21] covers those games above, as well as others. In this work we look at a combinatorial game called (amongst other ....
G. Tesauro, "Temporal Difference Learning and TD-Gammon." Communications of the ACM, 38(3), pp 58-68, 1995
....are based on whole board evaluation. The relative urgency of those moves is best determined on a case to case basis by the outcome of the game. Reinforcement learning can address the problem of action (move) selection based on delayed rewards and has been successfully used in Backgammon[Tesauro, 1995]. For longer games such as Go, one has to cope with increased spatio temporal decision complexity in addition to space complexity as it is the case for Backgammon. The claim of this paper is that for a large decision search space, defined in terms of different classes of input patterns, like that ....
Tesauro, G, Temporal Difference Learning and TD-Gammon, Communication of the ACM, Vol. 38, No. 3, 1995.
....trouble a single time. While Logistello can be regarded a classic two person game program, all of its move decision components are automatically tuned by machine learning techniques. This sets it apart from other programs that mostly rely on manual tuning and, after TD Gammon reaching master level [24], marks the second breakthrough of machine learning applied to games. In this article the evaluation, search, and opening book learning techniques pioneered in Logistello are surveyed. We first describe a novel evaluation function model and its application to Othello. The resulting evaluation ....
G. Tesauro. Temporal difference learning and TD--Gammon. Communications of the ACM, 38(3):58--68, 1995.
....the optimal actions for the current state to maximize its cumulative reward. For this purpose, the agent is given a reward by an external trainer for each action taken. The reward can be immediate or delayed. Sample RL applications are learning to play board games (e.g. Tesauro s backgammon [7]) or learning to control mobile robots, see e.g. 6] In the robot example, the robot is the agent that can sense its environments such that it knows its location, i.e. its state. The robot can decide which action to choose, e.g. to move ahead or to turn from one state to the next. The goal may ....
Tesauro, G., "Temporal difference learning and TDGammon, " Communications of the ACM, 38(3), pp.58-68, 1995.
....correct, for example due to noise or an error in the data. or with many fewer rewards than decisions by the learner. An example of a reinforcement learner is TD Gammon, a backgammon playing program that learns to play the game by simply getting an appropriate reward at the end of each game [Tes95] 2.2 Data Classification One of the main application areas of machine learning is data classification. Given an input, the learner s task is to assign it to one of a set of categories (or classes) As far as the learner is concerned, there need not be any meaning associated with the ....
G. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58--68, 1995.
....by the model used in computer Chess because of the huge space generated from the random process of dice rolling. Applying TD learning in Backgammon gives very good results: computer Backgammon programs using this approach are able to play in a level very close to the human world champion [28] [29]. The success of this application is comparable to the one in computer Chess as in both cases. The best programs reach the world champion level. TD learning has been applied to Go following its success in Backgammon [24] Some preliminary results show that applying TD learning in Go can obtain ....
....train neural networks to do the alive or dead analysis, which is an important sub problem in designing an evaluation function. The methodology of the applications will also be discussed in the next chapter. Chapter 3 Application of TD in Game Playing 3. 1 Introduction As illustrated in [28] [29], temporal difference (TD) learning has been applied successfully in the game Backgammon via learning by neural networks. In this thesis, we apply neural network learning, especially the TD learning method, to solve problems in computer Go. In this chapter, we first give a background of the ....
[Article contains additional citation context not shown here]
G. Tesauro. Temporal difference learning and TD-Gammon. Communications of ACM, 38(3), March 1995.
....classical Reinforcement Learning which also provides for incremental update of an evaluation function, although in this case it is represented as a table of values. Since Samuel s work however, Reinforcement Learning techniques were not used again in an adversarial setting until quite recently. [Tesauro, 1995, Thrun, 1995] have both used neural nets in a Reinforcement Learning paradigm. Tesauro, 1995] s work in the game of checkers 1 was successful, but required hand tuned features being fed to the algorithm for high quality play. Thrun, 1995] was moderately successful in using similar techniques ....
....function, although in this case it is represented as a table of values. Since Samuel s work however, Reinforcement Learning techniques were not used again in an adversarial setting until quite recently. Tesauro, 1995, Thrun, 1995] have both used neural nets in a Reinforcement Learning paradigm. [Tesauro, 1995] s work in the game of checkers 1 was successful, but required hand tuned features being fed to the algorithm for high quality play. Thrun, 1995] was moderately successful in using similar techniques in chess, but these techniques were not as successful as they had been in the checkers domain. ....
[Article contains additional citation context not shown here]
G Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58--67, 1995.
....robotics situations, where you might hope for a robot to learn how to accomplish a certain task in the same way a dog learns a trick, without explicit programming. Probably the most famous success story from reinforcement learning is TD Gammon, a program that learns to play the game of Backgammon [Tes95]. TD Gammon uses a neural network coupled with reinforcement learning techniques. After playing against itself for 1.5 million games, the program learned enough to rank among the world s best backgammon players (including both humans and computers) Reinforcement learning is much more challenging ....
....getting around this difficulty of tackling huge state spaces. Results have been mixed. We ll look at two of the more significant successes. 3.6. 1 TD Gammon The first big reinforcement learning success is certainly TD Gammon, a backgammon playing system developed by Tesauro in the early 1990 s [Tes95]. Although we re not going to look at the specifics of backgammon here, the important thing to know that there is a set of pieces you are trying to move to goal positions. Each turn, the player rolls a pair of dice and must choose from a set of moves dictated by that roll. Conventional ....
G. Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3), 1995.
....a sigmoidal activation function can be used to learn the value function. This approach can solve larger problems than table lookup but it is not guaranteed to converge. Although there have been some impressive successes using neural networks as function approximators for reinforcement learning [12], the neural network approach often performs poorly even on relatively simple problems [2] The problems associated with using standard backpropagation neural networks in reinforcement learning stem from the fact that these networks perform non local changes to the value function, while ....
Gerald Tesauro. Temporal difference learning and tdgammon. Communications of the ACM, 38(3):58--68, March 1995.
No context found.
Gerald Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58--68, 1995.
No context found.
G Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58--67, 1995.
No context found.
G. Tesauro. Temporal Difference Learning and TD-Gammon. Communications of the ACM, 38(3):58-68.
No context found.
G. Tesauro, "Temporal difference learning and TD-gammon," Communications of the ACM, vol. 38, no. 3, pp. 58--68, 1995.
No context found.
Tesauro, Gerald. (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM. Vol 38:3, 58-68.
No context found.
G. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), March 1995.
No context found.
G. Tesauro, Temporal difference learning and TD-Gammon, Comm. ACM 38 (3) (1995) 58--68.
No context found.
G. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), March 1995.
No context found.
G. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), March 1995.
No context found.
G Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58--67, 1995.
No context found.
G. Tesauro, Temporal difference learning and TD-GAMMON, Comm. ACM 38 (3) (1995) 58--68.
No context found.
G. J. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58--67, 1995.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC