论文标题
拉格朗日二元学习
Lagrangian Duality in Reinforcement Learning
论文作者
论文摘要
尽管二元性在某些领域(例如机器学习中的监督学习)广泛使用,但在其他领域(例如增强学习(RL))的探索程度较低。在本文中,我们展示了二元性如何参与各种RL工作,从率领该领域(例如Richard Bellman的价值迭代)到过去几年内完成的二元性工作,但已经产生了重大影响,例如TRPO,A3C和Gail。我们表明,二元性在强化学习中并不少见,尤其是在使用价值迭代或动态编程时,或者在使用第一阶或二阶近似值时,将最初的棘手问题转换为可拖延的凸面程序时。
Although duality is used extensively in certain fields, such as supervised learning in machine learning, it has been much less explored in others, such as reinforcement learning (RL). In this paper, we show how duality is involved in a variety of RL work, from that which spearheaded the field, such as Richard Bellman's value iteration, to that which was done within just the past few years yet has already had significant impact, such as TRPO, A3C, and GAIL. We show that duality is not uncommon in reinforcement learning, especially when value iteration, or dynamic programming, is used or when first or second order approximations are made to transform initially intractable problems into tractable convex programs.