论文标题
学习专注的成对互动以进行细粒度分类
Learning Attentive Pairwise Interaction for Fine-Grained Classification
论文作者
论文摘要
由于高度共同的类别之间的细微差异,细粒度的分类是一个具有挑战性的问题。大多数方法通过学习各个输入图像的判别性表示来解决这一困难。另一方面,人类可以通过比较图像对有效地识别对比线。受到这一事实的启发,本文提出了一个简单但有效的专注于成对相互作用网络(API-NET),该网络可以通过相互作用逐步识别一对细粒度图像。具体而言,API-NET首先学习一个相互特征向量,以捕获输入对中的语义差异。然后,它将这种相互载体与各个向量进行比较,以生成每个输入图像的门。这些独特的栅极向量在语义差异上继承了相互的上下文,这使API-NET可以通过两个图像之间的成对相互作用认真地捕获对比度线索。此外,我们以端到端的方式训练API-NET,并以分数排名正则化,这可以通过考虑功能优先级来进一步概括API-NET。我们对细粒度分类的五个流行基准进行了广泛的实验。 API-NET的表现优于最近的SOTA方法,即CUB-200-2011(90.0%),飞机(93.9%),斯坦福汽车(95.3%),斯坦福犬(90.3%)和Nabirds(88.1%)。
Fine-grained classification is a challenging problem, due to subtle differences among highly-confused categories. Most approaches address this difficulty by learning discriminative representation of individual input image. On the other hand, humans can effectively identify contrastive clues by comparing image pairs. Inspired by this fact, this paper proposes a simple but effective Attentive Pairwise Interaction Network (API-Net), which can progressively recognize a pair of fine-grained images by interaction. Specifically, API-Net first learns a mutual feature vector to capture semantic differences in the input pair. It then compares this mutual vector with individual vectors to generate gates for each input image. These distinct gate vectors inherit mutual context on semantic differences, which allow API-Net to attentively capture contrastive clues by pairwise interaction between two images. Additionally, we train API-Net in an end-to-end manner with a score ranking regularization, which can further generalize API-Net by taking feature priorities into account. We conduct extensive experiments on five popular benchmarks in fine-grained classification. API-Net outperforms the recent SOTA methods, i.e., CUB-200-2011 (90.0%), Aircraft(93.9%), Stanford Cars (95.3%), Stanford Dogs (90.3%), and NABirds (88.1%).