Sample mean based index policies with O(log(n)) regret for the multi-armed bandit problem (1995)

by R Agrawal
Venue:Advances in Applied Probability