In this talk, I will introduce the first generation of efficient value-based and policy-based RL algorithms under the setting where both the value function and policy are represented by powerful function approximators such as the kernel and neural network functions. The proposed algorithms highlight a systematic integration of the “optimism under the face of uncertainty’’ principle into algorithm design and are shown to enjoy both polynomial runtime and polynomial sample complexity. Finally, as an initial attempt to study multi-agent reinforcement learning, I will show how the value-based algorithm can be modified for solving zero-sum stochastic games with efficiency.
Bio: Zhuoran Yang a PhD candidate in the Department of Operations Research and Financial Engineering at Princeton University advised by Professor Jianqing Fan and Professor Han Liu. Prior to attending Princeton, he obtained his bachelor’s degree from Tsinghua University. Zhuoran’s research interests lie in the interface between machine learning, statistics and optimization. Specifically, the primary goal of his research is to design efficient learning algorithms for large-scale decision-making problems that arise in reinforcement learning and stochastic games, with both statistical and computational efficiencies.