单眼机器人导航，具有自我监督预审核的视觉变压器

论文标题

单眼机器人导航，具有自我监督预审核的视觉变压器

Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

论文作者

Saavedra-Ruiz, Miguel, Morin, Sacha, Paull, Liam

论文摘要

在这项工作中，我们考虑了使用几乎没有带注释的图像来学习单眼机器人导航的感知模型的问题。使用通过无标签的自我监督方法预测的视觉变压器（VIT），我们使用70个训练图像成功地为Duckietown环境训练了粗大的图像分割模型。我们的模型在8x8斑块水平上执行粗大的图像分割，并且可以调整推理分辨率以平衡预测粒度和实时感知约束。我们研究如何最好地适应我们的任务和环境，并发现某些轻质体系结构甚至可以在CPU上以可用的框架速率产生良好的单像分割。最终的感知模型被用作简单但强大的视觉伺服剂的骨干，我们将其部署在差分驱动器移动机器人上，以执行两个任务：泳道跟随和避免障碍物。

In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good single-image segmentation at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题