Linear Dueling Bandits

Dueling bandits is a variant of the multi-armed bandit problem where instead of playing an arm and observing the reward at each instant, you duel two arms (pairwise comparison) and observe the winner among the two. Linear bandits is a special case of contextual bandits where the reward of an arm is equal to its inner product with an unknown vector. Here we look at a setting which is the combination of the two – you have structure in the form of linearity and inner products, but you also observe limited information (1 bit) at every time instant. We are interested in the complexity of finding the best arm under this setting.

This is a work in progress. We performed some simulations and observed very interesting results. In this talk, I’ll formulate the problem, show the simulation results, and talk about ideas we have towards proving them.

June 18 @ 16:00
4:00 pm (1h)

Discovery Building, Orchard View Room

Sumeet Katariya