通过区域建议交互网络学习长期视觉动态

论文标题

通过区域建议交互网络学习长期视觉动态

Learning Long-term Visual Dynamics with Region Proposal Interaction Networks

论文作者

Qi, Haozhi, Wang, Xiaolong, Pathak, Deepak, Ma, Yi, Malik, Jitendra

论文摘要

学习长期动态模型是理解物理常识的关键。大多数现有的方法从视觉输入回避长期预测中学习动力学的方法通过诉诸于短期模型的快速重新计划。这不仅需要此类模型是超准确的，而且还将它们仅限于代理可以在每个步骤中连续获得反馈并采取行动直到完成的任务。在本文中，我们旨在利用视觉识别任务中成功故事中的想法来构建对象表示，这些对象表示可以捕获远距离捕获对象间和对象环境的相互作用。为此，我们提出了区域建议交互网络（RPIN），这是关于每个对象在潜在区域proposal特征空间中的轨迹的原因。多亏了简单而有效的对象表示，我们的方法在预测质量及其计划下游任务的能力方面都超过了先前的方法，并且可以很好地推广到新颖的环境。代码，预训练的模型和更多可视化结果可在https://haozhi.io/rpin上获得。

Learning long-term dynamics models is the key to understanding physical common sense. Most existing approaches on learning dynamics from visual input sidestep long-term predictions by resorting to rapid re-planning with short-term models. This not only requires such models to be super accurate but also limits them only to tasks where an agent can continuously obtain feedback and take action at each step until completion. In this paper, we aim to leverage the ideas from success stories in visual recognition tasks to build object representations that can capture inter-object and object-environment interactions over a long-range. To this end, we propose Region Proposal Interaction Networks (RPIN), which reason about each object's trajectory in a latent region-proposal feature space. Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin both in terms of prediction quality and their ability to plan for downstream tasks, and also generalize well to novel environments. Code, pre-trained models, and more visualization results are available at https://haozhi.io/RPIN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题