| "Some studies in machine learning using the game of checkers", A. L. Samuel, IBM Journal of Research and Development, Vol. 3, 1959, 211-229. |
....information about the value of the last action it took. In the general case, reinforcement can be arbi trarily delayed, and the problem of assigning reward or punishment to a state based on delayed reinforcement is termed temporal credit assignment. The first statement of the problem is due to [Samuel 59] whose checkerslearning program dealt with deciding which moves to reward for eventually leading to a triple jump. Temporal credit can be assigned in two ways: either the reward is appropriated to all of the state action pairs after it is received, or an expected value of the future reward is ....
"Some studies in machine learning using the game of checkers", A. L. Samuel, IBM Journal of Research and Development, Vol. 3, 1959, 211-229.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC