语言有条件的模仿学习通过非结构化数据

论文标题

语言有条件的模仿学习通过非结构化数据

Language Conditioned Imitation Learning over Unstructured Data

论文作者

Lynch, Corey, Sermanet, Pierre

论文摘要

自然语言也许是人类将任务传达给机器人的最灵活，最直观的方式。在模仿学习中的先前工作通常需要使用任务ID或目标图像指定每个任务 - 在开放世界环境中通常是不切实际的。另一方面，以前的教学方法中的先前方法允许代理行为以语言为指导，但通常在观测值，执行器或语言中假设结构将其适用性限制在诸如机器人技术之类的复杂设置中。在这项工作中，我们提出了一种将自由形式的自然语言调节纳入模仿学习的方法。我们的方法从像素，自然语言理解和多任务连续控制端到端作为单个神经网络学习感知。与先前的模仿学习工作不同，我们的方法能够合并未标记和非结构化演示数据（即没有任务或语言标签）。我们表明，这一大幅度提高了语言条件的表现，同时将语言注释的成本降低到总数据的不到1％。在测试时，接受我们方法训练的单语条件的视觉运动策略可以在3D环境中执行各种机器人操纵技巧，仅在每个任务的自然语言描述中指定（例如，“打开抽屉...现在拿起块...现在……现在按绿色按钮...”）。为了扩大代理商可以遵循的指令数量，我们建议将文本条件政策与较大的审慎神经语言模型相结合。我们发现这允许策略对许多分发的同义词说明具有鲁棒性，而无需新的演示。请参阅language-play.github.io的人类键入实时文本命令的视频

Natural language is perhaps the most flexible and intuitive way for humans to communicate tasks to a robot. Prior work in imitation learning typically requires each task be specified with a task id or goal image -- something that is often impractical in open-world environments. On the other hand, previous approaches in instruction following allow agent behavior to be guided by language, but typically assume structure in the observations, actuators, or language that limit their applicability to complex settings like robotics. In this work, we present a method for incorporating free-form natural language conditioning into imitation learning. Our approach learns perception from pixels, natural language understanding, and multitask continuous control end-to-end as a single neural network. Unlike prior work in imitation learning, our method is able to incorporate unlabeled and unstructured demonstration data (i.e. no task or language labels). We show this dramatically improves language conditioned performance, while reducing the cost of language annotation to less than 1% of total data. At test time, a single language conditioned visuomotor policy trained with our method can perform a wide variety of robotic manipulation skills in a 3D environment, specified only with natural language descriptions of each task (e.g. "open the drawer...now pick up the block...now press the green button..."). To scale up the number of instructions an agent can follow, we propose combining text conditioned policies with large pretrained neural language models. We find this allows a policy to be robust to many out-of-distribution synonym instructions, without requiring new demonstrations. See videos of a human typing live text commands to our agent at language-play.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题