选择的悖论：在等级加强学习中使用注意力

论文标题

选择的悖论：在等级加强学习中使用注意力

The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning

论文作者

Nica, Andrei, Khetarpal, Khimya, Precup, Doina

论文摘要

决策AI代理通常面临两个重要的挑战：计划范围的深度以及由于有很多选择而导致的分支因素。分层增强学习方法旨在通过提供跳过多个时间步骤的快捷方式来解决第一个问题。为了应对广度，希望在每个步骤中限制代理商的注意力，以达到合理数量的可能选择。负担能力的概念（Gibson，1977）表明，在某些州只有某些行动是可行的。在这项工作中，我们通过注意力机制对“负担”进行建模，该机制限制了时间扩展选项的可用选择。我们提出了一种在线，无模型的算法，以了解可用于进一步学习子目标选项的能力。我们研究了硬关注与软关注在训练数据收集，长距离任务中的抽象价值学习以及处理越来越多的选择中的作用。我们从验证中确定并说明了选择的悖论出现的设置，即，在更少但更有意义的选择时，可以改善强化学习者的学习速度和表现。

Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we model "affordances" through an attention mechanism that limits the available choices of temporally extended options. We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. We identify and empirically illustrate the settings in which the paradox of choice arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题