学习自我监督的低级网络，以弱和半监督语义分段

论文标题

学习自我监督的低级网络，以弱和半监督语义分段

Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and Semi-Supervised Semantic Segmentation

论文作者

Pan, Junwen, Zhu, Pengfei, Zhang, Kaihua, Cao, Bing, Wang, Yu, Zhang, Dingwen, Han, Junwei, Hu, Qinghua

论文摘要

具有有限注释的语义细分，例如弱监督的语义细分（WSSS）和半监督语义细分（SSSS），是一项艰巨的任务，最近引起了很多关注。大多数领先的WSSS方法都采用复杂的多阶段训练策略，以尽可能精确地估算伪标签，但它们具有很高的模型复杂性。相比之下，还有另一个研究线，该研究线在一个训练周期内训练一个带有图像级标签的网络。然而，由于伪标签估计值不准确，这种单阶段策略的表现通常很差。为了解决这个问题，本文为单阶段WSS和SSS提供了一个自我监督的低级网络（SLRNET）。 SLRNET使用跨视图自学意识，也就是说，它同时从图像的不同视图中预测了几种互补的专注LR表示，以学习精确的伪标记。具体而言，我们将LR表示学习作为集体矩阵分解问题重新将其重新制定，并以端到端的方式与网络学习共同优化。由此产生的LR表示，在捕获不同视图的稳定语义的同时，将噪音贬低，从而使其对输入变化稳健，从而使过度适合自我划分的错误。 SLRNET可以为各种标签有效的语义分割设置提供一个统一的单阶段框架：1）具有图像级标记数据的WSSS，2）带有几个像素级标记数据的SSS，以及3）SSS，带有一些像素级标记的数据和许多像素标记的数据和许多Image-limage-limage-Laim-Level标签数据。关于Pascal VOC 2012，可可和L2ID数据集的广泛实验表明，我们的SLRNET胜过具有各种不同环境的最先进的WSSS和SSSS方法，证明其良好的推广性和功效。

Semantic segmentation with limited annotations, such as weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS), is a challenging task that has attracted much attention recently. Most leading WSSS methods employ a sophisticated multi-stage training strategy to estimate pseudo-labels as precise as possible, but they suffer from high model complexity. In contrast, there exists another research line that trains a single network with image-level labels in one training cycle. However, such a single-stage strategy often performs poorly because of the compounding effect caused by inaccurate pseudo-label estimation. To address this issue, this paper presents a Self-supervised Low-Rank Network (SLRNet) for single-stage WSSS and SSSS. The SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several complementary attentive LR representations from different views of an image to learn precise pseudo-labels. Specifically, we reformulate the LR representation learning as a collective matrix factorization problem and optimize it jointly with the network learning in an end-to-end manner. The resulting LR representation deprecates noisy information while capturing stable semantics across different views, making it robust to the input variations, thereby reducing overfitting to self-supervision errors. The SLRNet can provide a unified single-stage framework for various label-efficient semantic segmentation settings: 1) WSSS with image-level labeled data, 2) SSSS with a few pixel-level labeled data, and 3) SSSS with a few pixel-level labeled data and many image-level labeled data. Extensive experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings, proving its good generalizability and efficacy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题