| Gerald Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215--219, 1994. |
....0.36 Table 2. Members in the worst team vs the worst team. All data are winning percentage of a member playing against its team in 50 games. 4.2 Comparison with Other Methods There has been a great deal of work in the backgammon game. The best machine player so far is Tesauro s TDGammon [15] [16] [17] TD Gammon used the TD reinforcement learning algorithm [14] to learn from itself. TDGammon started from random initial weights but achieved a very strong level of play. Tesauro s 1992 TD Gammon beat Sun Microsystems Gammontool and his own Neurogammon 1.0, which trained on expert knowledge. ....
G. Tesauro. Td-gammon, a self teaching backgammon program, achieves master-level play. Neural Computing, 6(2):215--219, 1994.
....sources. For instance, the two module system is based on two co evolving modules. Coevolution of competing strategies, however, is nothing new. See, for example, 7,19] for interesting cases. Also, the idea of improving a learner by letting it play against itself is ancient. See, for example, [20,41]. Even the idea of unsupervised learning through co evolution of predictors and modules trying to escape the predictions is nothing new it has been used extensively in our previous work on unsupervised sensory coding with neural networks [25,36,33,32,37] Finally, co evolutionary methods ....
G. Tesauro. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
....no reference to the DP literature existing at that time. Other early RL research was explicitly motivated by animal behavior and its neural basis [45, 33, 34, 71] Much of the current interest is attributable to Werbos [85, 86, 87] Watkins [82] and Tesauro s backgammonplaying system TD Gammon [75, 76]. Additional information about RL can be found in several references (e.g. 2, 5, 32, 72] Despite the utility of RL methods in many applications, the amount of time they can take to form acceptable approximate solutions can still be unacceptable. As a result, RL researchers are investigating ....
G. J. Tesauro. TD--gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215--219, 1994.
....typically used to generate a policy in a greedy fashion by choosing in each state the control (or action) with the highest value as given by the approximate value function. This approach has yielded some remarkable empirical successes in learning to play games, including checkers [21] backgammon [29, 30], and chess [4] Successes outside of the games domain include job shop scheduling [35] and dynamic channel allocation [24] While there are many algorithms for training approximate value functions (see [8, 27] for comprehensive treatments) with varying degrees of convergence guarantees, all ....
G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6:215--219, 1994.
....and action) values. Most algorithms then seek to minimize some form of error between the approximate value function and the true value function, usually by simulation (see [13] and [4] for comprehensive overviews) While there have been a multitude of empirical successes for this approach (see e. g [10, 14, 15, 3, 18, 11] to name but a few) it lacks any fundamental theoretical guarantees on the performance of the policy generated by the approximate value function (see [2, Section 1] for further discussion) Motivated by these difficulties, in [2] we introduced GPOMDP, a new algorithm for computing arbitrarily ....
G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6:215--219, 1994.
....Perkins and Barto, 2001b; Barto and Perkins, 2001] In problems where safety is not a concern or is easily achieved, reinforcement learning (RL) techniques have generated impressive solutions to difficult, large scale control problems. Examples include such diverse problems as playing backgammon [Tesauro, 1994] , elevator dispatching [Crites and Barto, 1998] option pricing [Tsitsiklis and Roy, 1999] and job shop scheduling [Zhang and Dietterich, 1996] However, there are obstacles to applying RL to problems where it is important to ensure reasonable system performance and or respect safety ....
G. J. Tesauro. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215--219, 1994.
....be solved by the model used in computer Chess because of the huge space generated from the random process of dice rolling. Applying TD learning in Backgammon gives very good results: computer Backgammon programs using this approach are able to play in a level very close to the human world champion [28] [29] The success of this application is comparable to the one in computer Chess as in both cases. The best programs reach the world champion level. TD learning has been applied to Go following its success in Backgammon [24] Some preliminary results show that applying TD learning in Go can ....
....We also train neural networks to do the alive or dead analysis, which is an important sub problem in designing an evaluation function. The methodology of the applications will also be discussed in the next chapter. Chapter 3 Application of TD in Game Playing 3. 1 Introduction As illustrated in [28] [29] temporal difference (TD) learning has been applied successfully in the game Backgammon via learning by neural networks. In this thesis, we apply neural network learning, especially the TD learning method, to solve problems in computer Go. In this chapter, we first give a background of the ....
[Article contains additional citation context not shown here]
G. Tesauro. TD-gammon, a self-teaching Backgammon program, achieves masterlevel play. Neural Computation, 6(2), 1994.
....in advance. Even if the real world was quantizable into a discrete state space, however, for all practical purposes this space will be inaccessible and remain unknown. Current proofs do not cover apparently minor deviations from the basic principle, such as the world class RL backgammon player [47], which uses a nonlinear function approximator to deal with a large but finite number of discrete states and, for the moment at least, seems a bit like a miracle without full theoretical foundation. Prior knowledge about the topology of a network connecting discrete states is also required by ....
G. Tesauro. TD-gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6(2):215--219, 1994.
....state action pairs. Let s step back to Q Learning with a parameterized function approximator. The objective of this approach is to get a compact representation of the value function e.g. a neural network and to utilize the generalization of the function approximator for continuous learning tasks [21]. The first variant we started with is the simplest variant of Q Learning that required a complete model of the world J (s; a 0 ) which is a function that maps a situation and an action onto a situation reached by using action a 0 in situation s. The equation to calculate the Q value for a ....
G. Tesauro. A self-teaching backgammon program, 1994.
....move chosen randomly. After 2,000 attempts, it ranks the moves in the test database correctly 87 percent of times. Golem has learnt to beat Many Faces of Go at the latter s low level. 9.2. Temporal Differences(TD) Learning TD learning has been successfully applied to Backgammon by Tesauro [Tesauro 94] It might also be applied successfully to other games. Two fundamentally different abilities can be defined in games. The first one is to foresee the likely continuation of a game, either by tree search, or by reasoning. The second one is the ability to assess a position accurately, using ....
G. Tesauro, TD-Gammon, a self teaching backgammon program, achieves master-level play. Neural Computation, 6 (2), (1994), pp. 215-219.
....an apparently less intellectual game than chess) makes the branching ration of backgammon much higher than than of chess: several hundred as opposed to 30 40) Thus, a solution cannot (yet) rely on brute force computing power as its primary technique. Gerald Tesauro has developed TD Gammon [51], a champion level backgammon playing program that relies on a form of reinforcement learning known as temporal difference learning or TD learning. TD Gammon explores how a TD learning machine trains a multi layer neural network to learn complex non linear functions. In this case, these functions ....
G. Tesauro. TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation, 6:215--219, 1994.
....to Elsevier Preprint 26 February 2001 The standard answer to the second question, How to evaluate , is to use a scalar evaluation such as an integer or real number, which is computed as the weighted sum of feature values P w i v i . Nonlinear feature combinators such as neural networks [4,6,39,41] and a generalized linear evaluation model [9] have also gained some popularity. Individual evaluation features are usually designed by a programmer in interaction with an expert player. Feature weights are carefully tuned using a combination of engineering and automatic parameter tuning ....
G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6:215-219, 1994.
....problem. 1 Introduction For complex reinforcement learning problems, TD( with function approximation [Sutton, 1988] has proved empirically successful. Its origins go back as far as Samuel s Checkers Program [Samuel, 1959] while perhaps its most famous success has been Tesauro s TD Gammon [Tesauro, 1992; 1994] . A variant of TD( for minimax search has also been successful in learning to play chess [Baxter et al. 2000] For linear approximation, TD( has been shown to minimise the squared error between the approximate value of each state and the true value [Tsitsikilis and Van Roy, 1997; Dayan, ....
....A geometric interpretation of TD(1) s behaviour in the two state system. affects not only artificial examples, but is also evident in a real domain: backgammon where TD( has had its most famous success. Our backgammon playing program has been created along the lines of Tesauro s TD Gammon [Tesauro, 1992; 1994] . Its neural network function approximator has 209 input nodes, no hidden layer, and 1 output node, which is a squashed linear function of the inputs. The input vector consists of 200 elements directly representing the board, 8 elements representing hand coded features extracted from the board ....
Gerald Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6:215--219, 1994.
.... of the value function tend to go by the name Reinforcement Learning, and have been extensively studied in the Machine Learning literature [6,22] This approach has yielded some remarkable empirical successes in a number of di erent domains, including learning to playcheckers [19] backgammon [23,24], and chess [4] job shop scheduling [27] and dynamic channel allocation [20] Despite this success, most algorithms for training approximate value functions su er from the same aw: the performance of the greedy policy derived from the approximate value function is not guaranteed to improve on ....
G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6:215-219, 1994.
....of this paper could be used to prove much tighter bounds. Results similar to the ones presented here were developed independently in [6] 2 The algorithms The SARSA(0) algorithm was rst suggested in [7] The V(0) algorithm was popularized by its use in the TD Gammon backgammon playing program [8]. 2 Fix a Markov decision process M , with a nite set S of states, a nite set A of actions, a terminal state T , an initial distribution S 0 over S, a one step reward function r : S A R, and a transition function : S A S [ fTg. M may also have a discount factor specifying how to ....
G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6:215-219, 1994.
....spectacularly far ahead of its time, was Samuel s checkers playing system #Samuel, 1959#. This learned a value function represented by a linear function approximator, and employed a training scheme similar to the updates used in value iteration, temporal di#erences and Q learning. More recently,Tesauro #1992, 1994, 1995# applied the temporal di#erence algorithm to backgammon. Backgammon has approximately 10 20 states, making table based reinforcement learning impossible. Instead, Tesauro used a backpropagation based three layer 270 Reinforcement Learning: A Survey Training Games Hidden Units ....
Tesauro, G. #1994#. TD-Gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6 #2#, 215#219.
....considered in continuous, real valued domains. For instance, we have not discussed linear dynamics and quadratic cost functions, often used in control theory [29] or the use of neural network representations of value functions, as frequently adopted withinthe reinforcement learning community [12, 132]. 58 However, even these techniques can be cast within the framework described here; for example, the use of piecewise linear value functions can be seen as a form of abstraction, where different linear components are applied to different regions or clusters of state space. Although we have ....
Gerald J. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6:215--219, 1994.
....on the well known acrobot problem. 1 Introduction For complex reinforcement learning problems, TD( with function approximation [2] has proved empirically successful. Its origins go back as far as Samuel s Checkers Program [3] while perhaps its most famous success has been Tesauro s TD Gammon [4, 5]. A variant of TD( for minimax search has also been successful in learning to play chess [6] Successes outside of the games domain include job shop scheduling [7] and dynamic channel allocation [8] Technical Report, Department of Computer Science, Australian National University, May 1999. ....
....of the system being observed. We now show that this behaviour affects not only artificial examples, but is also evident in a real domain: backgammon where TD( has had possibly its most famous success. Our backgammon playing program has been created along the lines of Tesauro s TD Gammon [4, 5]. Its neural network function approximator has 209 input nodes, no hidden layer, and 1 output node, which is a squashed linear function of the inputs. The input vector consists of 200 elements directly representing the board, 8 elements representing hand coded features extracted from the board ....
Gerald Tesauro. TD-Gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6:215--219, 1994.
No context found.
Gerald Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215--219, 1994.
No context found.
G.Tesauro. TD-Gammon, a self-teaching Backgammon program, achieves master-level play. Neural Computation, vol.6, no.2, pp.215-219, 1994.
No context found.
G. J. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215--219, 1994.
No context found.
G. J. Tesauro. TD--gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215--219, 1994.
No context found.
Gerald Tesauro. TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play. Technical report, IBM Corp., 1993. 34
No context found.
G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215--219, 1994.
No context found.
G. J. Tesauro, TD--Gammon, a Self--Teaching Backgammon Program, Achieves Master--Level Play. Neural Computation, 6(2):215--219, 1994. 34
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC