探索大规模零拍图像分类的层次图表示

论文标题

探索大规模零拍图像分类的层次图表示

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification

论文作者

Yi, Kai, Shen, Xiaoqian, Gou, Yunhao, Elhoseiny, Mohamed

论文摘要

我们在本文中解决的主要问题是如何扩展对看不见类（也称为零局学习）的视觉识别，以达到成千上万的类别，如Imagenet-21K基准中。在这个规模上，尤其是ImageNet-21K中包含许多细粒类别的规模，学习质量的视觉语义表示至关重要，这些质量视觉语义表示足以识别看不见的类别并将其与所见类别区分开来。我们提出了一个\ emph {h} ierarchical \ emph {g} raphical知识\ emph {r}基于置信的分类方法（称为hgr-net）。我们的实验结果表明，HGR-NET可以利用层次结构概念知识来掌握类遗传关系。与Imagenet-21K基准的亚军方法相比，我们的方法大大优于所有现有技术，使性能提高了7 \％。我们表明，HGR-NET在几个场景中学习有效。我们还对较小的数据集进行了分析，例如ImageNet-21K-P，2-s-s和3-shops，证明了其泛化能力。我们的基准和代码可在https://kaiyi.me/p/hgrnet.html上找到。

The main question we address in this paper is how to scale up visual recognition of unseen classes, also known as zero-shot learning, to tens of thousands of categories as in the ImageNet-21K benchmark. At this scale, especially with many fine-grained categories included in ImageNet-21K, it is critical to learn quality visual semantic representations that are discriminative enough to recognize unseen classes and distinguish them from seen ones. We propose a \emph{H}ierarchical \emph{G}raphical knowledge \emph{R}epresentation framework for the confidence-based classification method, dubbed as HGR-Net. Our experimental results demonstrate that HGR-Net can grasp class inheritance relations by utilizing hierarchical conceptual knowledge. Our method significantly outperformed all existing techniques, boosting the performance by 7\% compared to the runner-up approach on the ImageNet-21K benchmark. We show that HGR-Net is learning-efficient in few-shot scenarios. We also analyzed our method on smaller datasets like ImageNet-21K-P, 2-hops and 3-hops, demonstrating its generalization ability. Our benchmark and code are available at https://kaiyi.me/p/hgrnet.html.

下载PDF全文

下载文献需遵守相关版权规定

论文标题