An analysis of temporal-difference learning with function approximation (1997)

by John N Tsitsiklis, Benjamin Van Roy
Venue:Eötvös Loránd University, Department of Information Systems