Systems | Information | Learning | Optimization
 

Beyond worst-case: instance-dependent optimality in reinforcement learning

Recent years have witnessed great progress in reinforcement learning, both on the applied and theoretical fronts.  However, there remain a number of major gaps between theory and practice. One fact is that algorithms used in practice almost always perform far better than the guarantees provided by existing worst-case bounds.  How to close this gap between theory and practice?  Instance-dependent minimax bounds allow one to formalize the notion that “not all problems are equal”—some are easier than others.  In this talk, we describe some recent progress on obtaining non-asymptotic, instance-dependent, and optimal bounds in reinforcement learning, including various types of TD algorithms for policy evaluation, as well as Q-learning for policy optimization.
April 28 @ 12:30
12:30 pm (1h)

Remote

Martin Wainwright

Video