Multi-armed bandit problems with history-dependent rewards
The multi-armed bandit problem is a common sequential decision-making framework where at each time step a player selects an action and receives some reward from selecting that action. The aim is to select actions to maximize the total reward. Commonly it is assumed that the (expected) reward of each action …