Abstract: Dynamic pricing is one of the most common examples of online decision problems. With the development of e-commerce and the massive real-time data in online platforms today, feature-based pricing has become increasingly important. Semi-parametric feedback structure (caused by unknown market noise distribution) is a natural formulation in such problems, and benefits from tools from non-parametric statistical estimation.
In this work, we study feature-based dynamic pricing with semi-parametric feedback structure (unknown market noise distribution). We propose a dynamic learning and decision algorithm that makes use of the classical idea of tradeoff between exploration (statistical estimation) and exploitation (reward optimization). Under mild conditions, our proposed algorithm achieves near optimal regret in terms of dependence on the time horizon. This result offers a new perspective on combining statistical learning and decision making in the online decision context.
Orchard View Room, Virtual