在变压器中学习跨图像的语义关系，以进行几次精细颗粒的图像分类

论文标题

在变压器中学习跨图像的语义关系，以进行几次精细颗粒的图像分类

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification

论文作者

Zhang, Bo, Yuan, Jiakang, Li, Baopu, Chen, Tao, Fan, Jiayuan, Shi, Botian

论文摘要

很少有细粒度的学习旨在将查询图像分类为具有细粒度差异的一组支持类别之一。尽管通过深层神经网络学习不同对象的局部差异已经取得了成功，但如何利用基于变压器的架构中的查询支持的跨图像对象语义关系在几种良好的细粒度场景中仍未探索。在这项工作中，我们提出了一个基于变压器的双螺旋模型，即HelixFormer，以双向和对称方式实现跨图像对象语义挖掘。 HelixFormer由两个步骤组成：1）跨不同分支的关系挖掘过程（RMP），以及2）每个单个分支中的表示增强过程（REP）。通过设计的RMP，每个分支都可以使用来自其他分支的信息提取细粒的对象级跨图义语义关系图（CSRMS），从而确保在语义相关的本地对象区域中更好的跨图像相互作用。此外，借助CSRMS，开发的REP可以增强每个分支中发现的与语义相关的局部区域的提取特征，从而增强该模型区分细粒对象的细微特征差异的能力。在五个公共细粒基准上进行的广泛实验表明，螺旋形式可以有效地增强识别细粒物体的跨图像对象语义关系匹配，从而在1-Shot和5次场景下的大多数先进方法中实现更好的性能。我们的代码可在以下网址找到：https：//github.com/jiakangyuan/helixformer

Few-shot fine-grained learning aims to classify a query image into one of a set of support categories with fine-grained differences. Although learning different objects' local differences via Deep Neural Networks has achieved success, how to exploit the query-support cross-image object semantic relations in Transformer-based architecture remains under-explored in the few-shot fine-grained scenario. In this work, we propose a Transformer-based double-helix model, namely HelixFormer, to achieve the cross-image object semantic relation mining in a bidirectional and symmetrical manner. The HelixFormer consists of two steps: 1) Relation Mining Process (RMP) across different branches, and 2) Representation Enhancement Process (REP) within each individual branch. By the designed RMP, each branch can extract fine-grained object-level Cross-image Semantic Relation Maps (CSRMs) using information from the other branch, ensuring better cross-image interaction in semantically related local object regions. Further, with the aid of CSRMs, the developed REP can strengthen the extracted features for those discovered semantically-related local regions in each branch, boosting the model's ability to distinguish subtle feature differences of fine-grained objects. Extensive experiments conducted on five public fine-grained benchmarks demonstrate that HelixFormer can effectively enhance the cross-image object semantic relation matching for recognizing fine-grained objects, achieving much better performance over most state-of-the-art methods under 1-shot and 5-shot scenarios. Our code is available at: https://github.com/JiakangYuan/HelixFormer

下载PDF全文

下载文献需遵守相关版权规定

论文标题