Policy Gradient Methods for Reinforcement Learning with Function Approximation (1999)

by R S Sutton, D A McAllester, S P Singh, Y Mansour
Venue:In Neural Information Processing Systems (NIPS