论文标题
一个新颖的插电模块,用于细粒度的视觉分类
A Novel Plug-in Module for Fine-Grained Visual Classification
论文作者
论文摘要
视觉分类可以分为粗粒和细粒的分类。粗粒分类代表具有较大差异的类别,例如猫和狗的分类,而细粒的分类代表具有很大程度相似性的分类,例如猫种,鸟类,鸟类物种以及车辆的品牌或模型。与粗粒的视觉分类不同,细粒度的视觉分类通常需要专业专家来标记数据,这使数据更昂贵。为了应对这一挑战,许多方法建议自动找到最歧视的区域并使用本地功能来提供更精确的功能。这些方法只需要图像级注释,从而降低注释的成本。但是,这些方法中的大多数都需要两阶段或多阶段的体系结构,并且不能端到端训练。因此,我们提出了一个新颖的插件模块,该模块可以集成到许多常见的骨干上,包括基于CNN或基于变压器的网络,以提供强烈的歧视区域。插件模块可以输出像素级特征图和保险丝过滤功能,以增强细粒度的视觉分类。实验结果表明,所提出的插件模块的表现优于最先进的方法,并且在Cub200-2011和Nabirds上分别将精度分别提高到92.77 \%和92.83 \%。我们已经在github https://github.com/chou141253/fgvc-pim.git中发布了源代码。
Visual classification can be divided into coarse-grained and fine-grained classification. Coarse-grained classification represents categories with a large degree of dissimilarity, such as the classification of cats and dogs, while fine-grained classification represents classifications with a large degree of similarity, such as cat species, bird species, and the makes or models of vehicles. Unlike coarse-grained visual classification, fine-grained visual classification often requires professional experts to label data, which makes data more expensive. To meet this challenge, many approaches propose to automatically find the most discriminative regions and use local features to provide more precise features. These approaches only require image-level annotations, thereby reducing the cost of annotation. However, most of these methods require two- or multi-stage architectures and cannot be trained end-to-end. Therefore, we propose a novel plug-in module that can be integrated to many common backbones, including CNN-based or Transformer-based networks to provide strongly discriminative regions. The plugin module can output pixel-level feature maps and fuse filtered features to enhance fine-grained visual classification. Experimental results show that the proposed plugin module outperforms state-of-the-art approaches and significantly improves the accuracy to 92.77\% and 92.83\% on CUB200-2011 and NABirds, respectively. We have released our source code in Github https://github.com/chou141253/FGVC-PIM.git.