论文标题

学习自我监督的低级网络,以弱和半监督语义分段

Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and Semi-Supervised Semantic Segmentation

论文作者

Pan, Junwen, Zhu, Pengfei, Zhang, Kaihua, Cao, Bing, Wang, Yu, Zhang, Dingwen, Han, Junwei, Hu, Qinghua

论文摘要

具有有限注释的语义细分,例如弱监督的语义细分(WSSS)和半监督语义细分(SSSS),是一项艰巨的任务,最近引起了很多关注。大多数领先的WSSS方法都采用复杂的多阶段训练策略,以尽可能精确地估算伪标签,但它们具有很高的模型复杂性。相比之下,还有另一个研究线,该研究线在一个训练周期内训练一个带有图像级标签的网络。然而,由于伪标签估计值不准确,这种单阶段策略的表现通常很差。为了解决这个问题,本文为单阶段WSS和SSS提供了一个自我监督的低级网络(SLRNET)。 SLRNET使用跨视图自学意识,也就是说,它同时从图像的不同视图中预测了几种互补的专注LR表示,以学习精确的伪标记。具体而言,我们将LR表示学习作为集体矩阵分解问题重新将其重新制定,并以端到端的方式与网络学习共同优化。由此产生的LR表示,在捕获不同视图的稳定语义的同时,将噪音贬低,从而使其对输入变化稳健,从而使过度适合自我划分的错误。 SLRNET可以为各种标签有效的语义分割设置提供一个统一的单阶段框架:1)具有图像级标记数据的WSSS,2)带有几个像素级标记数据的SSS,以及3)SSS,带有一些像素级标记的数据和许多像素标记的数据和许多Image-limage-limage-Laim-Level标签数据。关于Pascal VOC 2012,可可和L2ID数据集的广泛实验表明,我们的SLRNET胜过具有各种不同环境的最先进的WSSS和SSSS方法,证明其良好的推广性和功效。

Semantic segmentation with limited annotations, such as weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS), is a challenging task that has attracted much attention recently. Most leading WSSS methods employ a sophisticated multi-stage training strategy to estimate pseudo-labels as precise as possible, but they suffer from high model complexity. In contrast, there exists another research line that trains a single network with image-level labels in one training cycle. However, such a single-stage strategy often performs poorly because of the compounding effect caused by inaccurate pseudo-label estimation. To address this issue, this paper presents a Self-supervised Low-Rank Network (SLRNet) for single-stage WSSS and SSSS. The SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several complementary attentive LR representations from different views of an image to learn precise pseudo-labels. Specifically, we reformulate the LR representation learning as a collective matrix factorization problem and optimize it jointly with the network learning in an end-to-end manner. The resulting LR representation deprecates noisy information while capturing stable semantics across different views, making it robust to the input variations, thereby reducing overfitting to self-supervision errors. The SLRNet can provide a unified single-stage framework for various label-efficient semantic segmentation settings: 1) WSSS with image-level labeled data, 2) SSSS with a few pixel-level labeled data, and 3) SSSS with a few pixel-level labeled data and many image-level labeled data. Extensive experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings, proving its good generalizability and efficacy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源