论文标题
强大的增强学习和分配风险的配方
Robust Reinforcement Learning with Distributional Risk-averse formulation
论文作者
论文摘要
强大的加强学习试图使预测对系统的动态或奖励的变化更加健壮。当从数据中估算环境的动态和奖励时,此问题尤其重要。在本文中,我们近似使用$φ$ divergence使用近似风险 - 避免风险的配方来限制强大的增强学习。我们表明,通过目标的标准偏差惩罚,可以鲁属地经典的增强学习配方。在经典的健身房环境中提出和测试了两种基于分布强化学习的算法,一种用于离散的算法,一种用于连续的动作空间,以证明算法的鲁棒性。
Robust Reinforcement Learning tries to make predictions more robust to changes in the dynamics or rewards of the system. This problem is particularly important when the dynamics and rewards of the environment are estimated from the data. In this paper, we approximate the Robust Reinforcement Learning constrained with a $Φ$-divergence using an approximate Risk-Averse formulation. We show that the classical Reinforcement Learning formulation can be robustified using standard deviation penalization of the objective. Two algorithms based on Distributional Reinforcement Learning, one for discrete and one for continuous action spaces are proposed and tested in a classical Gym environment to demonstrate the robustness of the algorithms.