从CNN到基于复杂小波的转移不变的双胞胎模型

论文标题

从CNN到基于复杂小波的转移不变的双胞胎模型

From CNNs to Shift-Invariant Twin Models Based on Complex Wavelets

论文作者

Leterme, Hubert, Polisano, Kévin, Perrier, Valérie, Alahari, Karteek

论文摘要

我们提出了一种新的方法，以增加卷积神经网络中的变化不变性和预测准确性。具体而言，我们用“复杂值卷积 +模量”（CMOD）（CMOD）替换了第一层组合“实值汇总 +最大池化”（RMAX），该组合对翻译或变化稳定。为了证明我们的方法是合理的，我们声称当卷积内核是带通和定向（Gabor样滤波器）时，CMOD和RMAX产生了可比的输出。因此，在这种情况下，CMOD可以被视为RMAX的稳定替代品。为了执行此属性，我们限制了卷积内核采用这种类似Gabor的结构。相应的架构称为数学双胞胎，因为它采用明确定义的数学操作员来模仿原始，自由训练的模型的行为。与基于低通滤波的先前方法相比，我们的方法在Imagenet和Cifar-10分类任务上实现了卓越的精度。可以说，我们的方法强调保留高频细节，在转移不变性和信息保存之间取得了更好的平衡，从而提高了性能。此外，它的计算成本和内存足迹比并发工作较低，这使其成为实施实施的有前途的解决方案。

We propose a novel method to increase shift invariance and prediction accuracy in convolutional neural networks. Specifically, we replace the first-layer combination "real-valued convolutions + max pooling" (RMax) by "complex-valued convolutions + modulus" (CMod), which is stable to translations, or shifts. To justify our approach, we claim that CMod and RMax produce comparable outputs when the convolution kernel is band-pass and oriented (Gabor-like filter). In this context, CMod can therefore be considered as a stable alternative to RMax. To enforce this property, we constrain the convolution kernels to adopt such a Gabor-like structure. The corresponding architecture is called mathematical twin, because it employs a well-defined mathematical operator to mimic the behavior of the original, freely-trained model. Our approach achieves superior accuracy on ImageNet and CIFAR-10 classification tasks, compared to prior methods based on low-pass filtering. Arguably, our approach's emphasis on retaining high-frequency details contributes to a better balance between shift invariance and information preservation, resulting in improved performance. Furthermore, it has a lower computational cost and memory footprint than concurrent work, making it a promising solution for practical implementation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题