论文标题
语义姿势验证以自我监督的对比度学习,以进行户外视觉本地化
Semantic Pose Verification for Outdoor Visual Localization with Self-supervised Contrastive Learning
论文作者
论文摘要
任何城市尺度的视觉定位系统都必须克服长期的外观变化,例如不同的照明条件或查询和数据库图像之间的季节性变化。由于语义内容对此类更改更为强大,因此我们利用语义信息来改善视觉定位。在我们的情况下,数据库由从全景图像(例如Google Street View)生成的Gnomonic视图和查询图像在不同时间使用标准视野摄像头收集。为了改善本地化,我们检查查询和数据库图像之间的语义相似性,这不是微不足道的,因为摄像机的位置和观点不完全匹配。为了学习相似性,我们建议以自我监督的方式培训CNN,并在语义分段图像的数据集上进行对比学习。通过实验,我们表明这种语义相似性估计方法比测量像素级别的相似性更好。最后,我们使用语义相似性得分来验证通过最先进的视觉定位方法获得的检索,并观察到基于对比的基于学习的姿势验证将TOP-1召回率提高到0.90,这对应于2%。
Any city-scale visual localization system has to overcome long-term appearance changes, such as varying illumination conditions or seasonal changes between query and database images. Since semantic content is more robust to such changes, we exploit semantic information to improve visual localization. In our scenario, the database consists of gnomonic views generated from panoramic images (e.g. Google Street View) and query images are collected with a standard field-of-view camera at a different time. To improve localization, we check the semantic similarity between query and database images, which is not trivial since the position and viewpoint of the cameras do not exactly match. To learn similarity, we propose training a CNN in a self-supervised fashion with contrastive learning on a dataset of semantically segmented images. With experiments we showed that this semantic similarity estimation approach works better than measuring the similarity at pixel-level. Finally, we used the semantic similarity scores to verify the retrievals obtained by a state-of-the-art visual localization method and observed that contrastive learning-based pose verification increases top-1 recall value to 0.90 which corresponds to a 2% improvement.