Data Re-weighting for Data Efficient Reinforcement Learning

Learning from interaction with the environment — trying untested actions, observing successes and failures, and tying effects back to causes — is one of the first capabilities thought of when considering intelligent agents. Reinforcement learning is the area of artificial intelligence research that has the goal of allowing autonomous agents to learn in this way. Despite many recent empirical successes, most modern reinforcement learning algorithms are still limited by the large amounts of experience required before useful skills are learned. Making reinforcement learning more data efficient would allow computers to autonomously solve complex tasks in dynamic environments such as those found in robotics, traffic management, or healthcare.

In this talk I will describe recent work to increase the data efficiency of reinforcement learning algorithms via data re-weighting. I will introduce a novel re-weighting technique that allows RL agents to more efficiently use a finite set of samples. The key idea behind this technique is to use importance sampling to convert the empirical distribution of samples to the expected distribution, thus reducing sampling error in the observed data. In the first part of this talk I will describe how this technique leads to more efficient batch policy evaluation and mini-batch policy gradient reinforcement learning. In the second part of the talk I will describe the extension of this work to batch value function learning and introduce a more data efficient version of the fundamental temporal difference learning algorithm.

November 11, 2020

12:30 pm (1h)

Remote

Josiah Hanna

Video