R-MAX—a general polynomial time algorithm for near-optimal reinforcement learning. (2002)

by R I Brafman, M Tennenholtz
Venue:Journal of Machine Learning Research,