共同培训$ 2^l $子模型用于视觉识别

论文标题

共同培训$ 2^l $子模型用于视觉识别

Co-training $2^L$ Submodels for Visual Recognition

论文作者

Touvron, Hugo, Cord, Matthieu, Oquab, Maxime, Bojanowski, Piotr, Verbeek, Jakob, Jégou, Hervé

论文摘要

我们介绍了子模型共同训练，这是一种与共训练，自我鉴定和随机深度有关的正则化方法。给定一个要训练的神经网络，对于每个样本，我们隐式实例化了两个改变的网络``subsodels''，并具有随机深度：我们仅激活一部分层。每个网络都可以作为彼此的软老师，通过提供补充一击标签所提供的常规损失的损失。我们的方法称为COSUB，使用一组权重，不涉及预先训练的外部模型或时间平均。在实验上，我们表明子模型共同训练可有效训练骨干的识别任务，例如图像分类和语义分割。我们的方法与多个架构兼容，包括Regnet，Vit，Pit，Xcit，Swin和Convnext。我们的培训策略在可比的环境中改善了他们的结果。例如，在ImageNet-21K上预处理的VIT-B获得了87.4％的Top-1 ACC。 @Imagenet-Val上的448。

We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, ``submodels'', with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the regular loss provided by the one-hot label. Our approach, dubbed cosub, uses a single set of weights, and does not involve a pre-trained external model or temporal averaging. Experimentally, we show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation. Our approach is compatible with multiple architectures, including RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their results in comparable settings. For instance, a ViT-B pretrained with cosub on ImageNet-21k obtains 87.4% top-1 acc. @448 on ImageNet-val.

下载PDF全文

下载文献需遵守相关版权规定

论文标题