Reinforcement learning (RL) is a central problem in machine learning where an agent learns the optimal behaviors through interactions within an unknown environment. While existing research predominantly focuses on fully observable environments within the Markov decision process (MDP) framework, real-world RL scenarios often involve crucial yet unobservable information. e.g., in recommendation systems with imperfect user information, communication over channels in different weather conditions, medical treatments with unidentified diseases.
In the first part of the talk, I consider reinforcement learning in partially observable systems through the proposed framework of the Latent Markov Decision Process (LMDP). In LMDPs, an MDP is randomly drawn from a set of possible MDPs at the beginning of the interaction, but the context, i.e., hidden factors of the chosen MDP is not revealed to the agent. This talk presents our several recent results on this class of problems with an explicit gathering and use of higher-order information, deriving new and sample-efficient RL algorithms in partially observed environments.
Then, we pivot to discuss the fundamental role of future prediction in addressing the challenges of partially observable systems. Inspired by this insight, we propose a practical solution aiming to bridge RL theory with real-world applications. Our approach completely decouples the representation learning of histories from policy optimization, showcasing the greater stability of training and improved performance compared to the popular end-to-end training in partially observable systems with long-term history dependencies. I will conclude the talk with a few other ongoing and future research projects to conquer real-world reinforcement learning.
Discovery Building, Orchard View Room
Jeongyeol Kwon