不断发展的归一化激活层

论文标题

不断发展的归一化激活层

Evolving Normalization-Activation Layers

论文作者

Liu, Hanxiao, Brock, Andrew, Simonyan, Karen, Le, Quoc V.

论文摘要

归一化层和激活函数是深网中的基本组件，通常彼此共存。在这里，我们建议使用自动方法设计它们。我们将它们统一为单个张量到张量计算图，而不是单独设计它们，并从基本的数学函数开始发展其结构。这种数学功能的示例是加法，乘法和统计矩。与使用主流NAS中的高级模块相比，低级数学功能的使用导致高度稀疏且较大的搜索空间，这对于搜索方法可能具有挑战性。为了应对挑战，我们开发了有效的拒绝协议，以快速过滤效果不佳的候选层。我们还使用多目标进化来优化许多架构中的每一层性能，以防止过度拟合。我们的方法导致发现Evonorms，这是一组新的归一化激活层，具有新颖的，有时甚至令人惊讶的结构，这些结构超出了现有的设计模式。例如，某些EVONOMS不认为必须依次应用归一化和激活函数，也不需要将特征图中心，也不需要显式激活功能。我们的实验表明，Evonorms在图像分类模型上很好地工作，包括重新NET，动员和有效网络，但也可以很好地转移到使用FPN/Spinetet的掩膜R-CNN中，例如分割和BigGAN，以用于图像合成，在许多情况下都超过了BatchNorm和GroupNorm的层。

Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical functions are addition, multiplication and statistical moments. The use of low-level mathematical functions, in contrast to the use of high-level modules in mainstream NAS, leads to a highly sparse and large search space which can be challenging for search methods. To address the challenge, we develop efficient rejection protocols to quickly filter out candidate layers that do not work well. We also use multi-objective evolution to optimize each layer's performance across many architectures to prevent overfitting. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures that go beyond existing design patterns. For example, some EvoNorms do not assume that normalization and activation functions must be applied sequentially, nor need to center the feature maps, nor require explicit activation functions. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets but also transfer well to Mask R-CNN with FPN/SpineNet for instance segmentation and to BigGAN for image synthesis, outperforming BatchNorm and GroupNorm based layers in many cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题