Policy Gradient Method for Reinforcement Learning beyond Cumulative Rewards
Mengdi Wang
Princeton University
Assistant Professor, Department of Operations Research and Financial Engineering, Princeton University.
Mengdi Wang
Princeton University
Video: https://vimeo.com/191080400 Classical stochastic optimization models usually involve expected-value objective functions. However, they do not apply to the minimization of a composition of two or multiple expected-value functions, i.e., the stochastic nested composition optimization problem. Stochastic composition optimization finds wide application in estimation, risk-averse optimization, dimension reduction and reinforcement learning. …