Beyond worst-case: instance-dependent optimality in reinforcement learning

Recent years have witnessed great progress in reinforcement learning, both on the applied and theoretical fronts. However, there remain a number of major gaps between theory and practice. One fact is that algorithms used in practice almost always perform far better than the guarantees provided by existing worst-case bounds. How to close this gap between theory and practice? Instance-dependent minimax bounds allow one to formalize the notion that “not all problems are equal”—some are easier than others. In this talk, we describe some recent progress on obtaining non-asymptotic, instance-dependent, and optimal bounds in reinforcement learning, including various types of TD algorithms for policy evaluation, as well as Q-learning for policy optimization.

April 28, 2021

12:30 pm (1h)

Remote

Martin Wainwright

Video