Ervin Tanczos

Towards a better understanding of best arm identification in bounded multi-armed bandits

Video: https://vimeo.com/202266237 We present ongoing work regarding best arm identification in multi-armed bandit problems, when the reward distributions are bounded. Although this is a standard assumption in this context, state of the art methods (such as the lil-UCB algorithm) use sub-Gaussian concentration bounds for the mean rewards. However, one can …