封锁组装！学习与大规模结构化强化学习组装

论文标题

封锁组装！学习与大规模结构化强化学习组装

Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning

论文作者

Ghasemipour, Seyed Kamyar Seyed, Freeman, Daniel, David, Byron, Gu, Shixiang Shane, Kataoka, Satoshi, Mordatch, Igor

论文摘要

组合多部分物理结构既是自动机器人技术的宝贵最终产品，也是对体现智能代理的开放式培训的宝贵诊断任务。我们介绍了一个基于自然主义物理的环境，其中包含一组受儿童玩具套件启发的可连接磁铁块。目的是将块组装成一系列目标蓝图。尽管这个目标很简单，但建立各种块的构成蓝图的组成性质会导致代理遇到的结构中复杂性的爆炸爆炸。此外，组装强调代理的多步计划，物理推理和双人协调。我们发现，大规模增强学习和基于图的策略的组合 - 令人惊讶的是，没有任何其他复杂性 - 是训练剂的有效秘诀，不仅可以以零拍的方式推广到复杂的不看到的不见了的蓝图，而且甚至在无重置设置的情况下都可以在无需训练的情况下进行操作。通过广泛的实验，我们强调了大规模培训，结构化表示，多任务与单任务学习的贡献以及课程的影响以及讨论训练有素的代理的定性行为的重要性。

Assembly of multi-part physical structures is both a valuable end product for autonomous robotics, as well as a valuable diagnostic task for open-ended training of embodied intelligent agents. We introduce a naturalistic physics-based environment with a set of connectable magnet blocks inspired by children's toy kits. The objective is to assemble blocks into a succession of target blueprints. Despite the simplicity of this objective, the compositional nature of building diverse blueprints from a set of blocks leads to an explosion of complexity in structures that agents encounter. Furthermore, assembly stresses agents' multi-step planning, physical reasoning, and bimanual coordination. We find that the combination of large-scale reinforcement learning and graph-based policies -- surprisingly without any additional complexity -- is an effective recipe for training agents that not only generalize to complex unseen blueprints in a zero-shot manner, but even operate in a reset-free setting without being trained to do so. Through extensive experiments, we highlight the importance of large-scale training, structured representations, contributions of multi-task vs. single-task learning, as well as the effects of curriculums, and discuss qualitative behaviors of trained agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题