轴向 - 深度列表：全侧轴向注意力集中分段

论文标题

轴向 - 深度列表：全侧轴向注意力集中分段

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

论文作者

Wang, Huiyu, Zhu, Yukun, Green, Bradley, Adam, Hartwig, Yuille, Alan, Chen, Liang-Chieh

论文摘要

卷积利用当地的效率利用了效率。已经采用了自我注意力来增加具有非本地相互作用的CNN。最近的著作证明，可以通过将注意力限制在当地地区来堆叠自我注意力层以获得完全注意力的网络。在本文中，我们试图通过将2D自我注意力分解为两个一维自我来消除这一约束。这降低了计算复杂性，并允许在较大甚至全局区域内进行注意力。在伴侣中，我们还提出了对位置敏感的自我发场设计。结合两者都产生了我们对位置敏感的轴向注意层，这是一个新型的构建块，可以堆叠以形成轴向注意模型，以进行图像分类和密集的预测。我们在四个大型数据集上演示了我们的模型的有效性。特别是，我们的模型优于ImageNet上所有现有的独立自我注意力模型。我们的轴向深行提高了可可测试-DEV的自下而上最新的PQ。我们的小型变体（3.8倍参数效率和27倍计算效率）实现了先前的最先进。 Axial-DeepLab还可以在Mapillary Vistas和Cityscapes上取得最新的结果。

Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题