Adaptive Bandits: Towards the best history-dependent strategy (2011)

by Odalric-Ambrym Maillard, Rémi Munos
Venue:In 24th Conf. on Learning Theory (COLT