R-max-a general polynomial time algorithm for near-optimal reinforcement learning,” (2003)

by R Brafman, M Tennenholtz
Venue:The Journal of Machine Learning Research,