Systems | Information | Learning | Optimization
 

Double Bandits

Regular bandit problems assume that we have a fixed reward vector and that we need to find the optimal arm for that one vector. In practice, people might have different taste profiles and websites propose more than one arm at a time (or ad, or sale item, etc) for a …