Systems | Information | Learning | Optimization
 

Model-Predictive Policy Learning with Uncertainty Regularisation for Driving in Dense Traffic

Learning a policy using only observational data is challenging because the distribution of states it induces at execution time may differ from the distribution observed during training. In this work, we propose to train a policy while explicitly penalising the mismatch between these two distributions over a fixed time horizon. …