通过约束演化的体现神经智能的共同设计

论文标题

通过约束演化的体现神经智能的共同设计

Co-design of Embodied Neural Intelligence via Constrained Evolution

论文作者

Wang, Zhiquan, Benes, Bedrich, Qureshi, Ahmed H., Mousas, Christos

论文摘要

我们引入了一种新颖的共同设计方法，用于通过将深度强化学习和进化与用户控制结合起来，用于自主移动剂的形状属性和运动。我们的主要灵感来自进化，这导致了自然界的广泛可变性和适应性，并有可能同时显着改善设计和行为。我们的方法采用具有可选简单约束的输入代理，例如不应发展或允许更改范围的腿部零件。它使用基于物理的模拟来确定其运动，并找到输入设计的行为策略，后来用作比较的基线。然后在允许的范围内随机修改代理，从而创建了几百个代理的新一代。通过转移先前的政策来培训这一代，从而大大加快了培训。选择了表现最佳的代理，并使用其跨界和突变形成新一代。然后，训练下一代，直到达到令人满意的结果。我们显示了各种各样的进化药物，我们的结果表明，即使只有10％的变化，进化药物的总体表现也会提高50％。如果允许对初始设计进行更大的重大变化，我们的实验性能将提高到150％。与相关工作相反，我们的共同设计在单个GPU上工作，并通过在一小时内训练数千个代理来提供令人满意的结果。

We introduce a novel co-design method for autonomous moving agents' shape attributes and locomotion by combining deep reinforcement learning and evolution with user control. Our main inspiration comes from evolution, which has led to wide variability and adaptation in Nature and has the potential to significantly improve design and behavior simultaneously. Our method takes an input agent with optional simple constraints such as leg parts that should not evolve or allowed ranges of changes. It uses physics-based simulation to determine its locomotion and finds a behavior policy for the input design, later used as a baseline for comparison. The agent is then randomly modified within the allowed ranges creating a new generation of several hundred agents. The generation is trained by transferring the previous policy, which significantly speeds up the training. The best-performing agents are selected, and a new generation is formed using their crossover and mutations. The next generations are then trained until satisfactory results are reached. We show a wide variety of evolved agents, and our results show that even with only 10% of changes, the overall performance of the evolved agents improves 50%. If more significant changes to the initial design are allowed, our experiments' performance improves even more to 150%. Contrary to related work, our co-design works on a single GPU and provides satisfactory results by training thousands of agents within one hour.

下载PDF全文

下载文献需遵守相关版权规定

论文标题