Gradient-Based Optimization of Markov Reward Processes: Practical Variants (2000)

by Peter Marbach, John N. Tsitsiklis
Venue:Machine Learning