通过多尺度的关注来学习视觉位置识别的语义

论文标题

通过多尺度的关注来学习视觉位置识别的语义

Learning Semantics for Visual Place Recognition through Multi-Scale Attention

论文作者

Paolicelli, Valerio, Tavera, Antonio, Masone, Carlo, Berton, Gabriele, Caputo, Barbara

论文摘要

在本文中，我们解决了Visual Place识别（VPR）的任务，其中的目标是通过巨大的地理标书检索给定查询图像的正确GPS坐标。尽管最近的作品表明，结合语义和外观信息的构建描述符是有益的，但当前的最新方法选择了对重要语义内容的自上而下的定义。在这里，我们介绍了第一种VPR算法，该算法从数据的视觉外观和语义内容中学习了强大的全局嵌入，而分割过程通过多尺度注意模块的识别位置动态指导。在各种情况下进行的实验验证了这种新方法，并证明了其针对最新方法的性能。最后，我们提出了第一个适合于放置识别和分割任务的合成世界数据集。

In this paper we address the task of visual place recognition (VPR), where the goal is to retrieve the correct GPS coordinates of a given query image against a huge geotagged gallery. While recent works have shown that building descriptors incorporating semantic and appearance information is beneficial, current state-of-the-art methods opt for a top down definition of the significant semantic content. Here we present the first VPR algorithm that learns robust global embeddings from both visual appearance and semantic content of the data, with the segmentation process being dynamically guided by the recognition of places through a multi-scale attention module. Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods. Finally, we propose the first synthetic-world dataset suited for both place recognition and segmentation tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题