Provably Efficient Exploration in Reinforcement Learning: An Optimistic Approach

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation such as deep neural networks must be deployed to approximate either the value function or the policy. The introduction of function approximation raises a fundamental set of challenges involving computational and statistical efficiency, especially under the online setting with active data acquisition. As a result, a core RL question remains open: how can we design provably efficient RL algorithms that incorporate possibly nonlinear function approximation?

In this talk, I will introduce the first generation of efficient value-based and policy-based RL algorithms under the setting where both the value function and policy are represented by powerful function approximators such as the kernel and neural network functions. The proposed algorithms highlight a systematic integration of the “optimism under the face of uncertainty’’ principle into algorithm design and are shown to enjoy both polynomial runtime and polynomial sample complexity. Finally, as an initial attempt to study multi-agent reinforcement learning, I will show how the value-based algorithm can be modified for solving zero-sum stochastic games with efficiency.

Bio: Zhuoran Yang a PhD candidate in the Department of Operations Research and Financial Engineering at Princeton University advised by Professor Jianqing Fan and Professor Han Liu. Prior to attending Princeton, he obtained his bachelor’s degree from Tsinghua University. Zhuoran’s research interests lie in the interface between machine learning, statistics and optimization. Specifically, the primary goal of his research is to design efficient learning algorithms for large-scale decision-making problems that arise in reinforcement learning and stochastic games, with both statistical and computational efficiencies.

September 30, 2020

12:30 pm (1h)

Remote

Zhuoran Yang

Video