论文标题

通过元学习的本能进行安全加强学习

Safe Reinforcement Learning through Meta-learned Instincts

论文作者

Grbic, Djordje, Risi, Sebastian

论文摘要

加强学习的一个重要目标是创建可以迅速适应新目标的代理,同时避免可能对自己或环境造成损害的情况。代理商学习的一种方式是通过探索机制来发现新的政策。但是,在深度强化学习中,通常是通过在动作空间中注入噪声来进行探索。尽管在许多领域表现良好,但这种设置具有固有的风险,即代理商执行的嘈杂动作导致环境中不安全状态。在这里,我们介绍了一种称为元学习的本能网络(MLIN)的新颖方法,该方法允许代理商在其一生中安全地学习,同时避免潜在的危险状态。该方法的核心是通过增强学习和进化的“本能”网络训练的塑料网络,该网络在代理商的一生中不会改变,但可以调节塑料网络的嘈杂输出。我们测试了使用无行区域的简单2D导航任务的想法,在部署过程中,代理必须学会实现新目标。 MLIN的表现优于标准元训练的网络,允许代理商在不与任何无区域相撞的情况下学习到新目标。这些结果表明,通过本能网络增强的元学习是安全AI的一种有希望的新方法,这可能会在各种不同领域的该领域取得进展。

An important goal in reinforcement learning is to create agents that can quickly adapt to new goals while avoiding situations that might cause damage to themselves or their environments. One way agents learn is through exploration mechanisms, which are needed to discover new policies. However, in deep reinforcement learning, exploration is normally done by injecting noise in the action space. While performing well in many domains, this setup has the inherent risk that the noisy actions performed by the agent lead to unsafe states in the environment. Here we introduce a novel approach called Meta-Learned Instinctual Networks (MLIN) that allows agents to safely learn during their lifetime while avoiding potentially hazardous states. At the core of the approach is a plastic network trained through reinforcement learning and an evolved "instinctual" network, which does not change during the agent's lifetime but can modulate the noisy output of the plastic network. We test our idea on a simple 2D navigation task with no-go zones, in which the agent has to learn to approach new targets during deployment. MLIN outperforms standard meta-trained networks and allows agents to learn to navigate to new targets without colliding with any of the no-go zones. These results suggest that meta-learning augmented with an instinctual network is a promising new approach for safe AI, which may enable progress in this area on a variety of different domains.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源