论文标题

从CNN到基于复杂小波的转移不变的双胞胎模型

From CNNs to Shift-Invariant Twin Models Based on Complex Wavelets

论文作者

Leterme, Hubert, Polisano, Kévin, Perrier, Valérie, Alahari, Karteek

论文摘要

我们提出了一种新的方法,以增加卷积神经网络中的变化不变性和预测准确性。具体而言,我们用“复杂值卷积 +模量”(CMOD)(CMOD)替换了第一层组合“实值汇总 +最大池化”(RMAX),该组合对翻译或变化稳定。为了证明我们的方法是合理的,我们声称当卷积内核是带通和定向(Gabor样滤波器)时,CMOD和RMAX产生了可比的输出。因此,在这种情况下,CMOD可以被视为RMAX的稳定替代品。为了执行此属性,我们限制了卷积内核采用这种类似Gabor的结构。相应的架构称为数学双胞胎,因为它采用明确定义的数学操作员来模仿原始,自由训练的模型的行为。与基于低通滤波的先前方法相比,我们的方法在Imagenet和Cifar-10分类任务上实现了卓越的精度。可以说,我们的方法强调保留高频细节,在转移不变性和信息保存之间取得了更好的平衡,从而提高了性能。此外,它的计算成本和内存足迹比并发工作较低,这使其成为实施实施的有前途的解决方案。

We propose a novel method to increase shift invariance and prediction accuracy in convolutional neural networks. Specifically, we replace the first-layer combination "real-valued convolutions + max pooling" (RMax) by "complex-valued convolutions + modulus" (CMod), which is stable to translations, or shifts. To justify our approach, we claim that CMod and RMax produce comparable outputs when the convolution kernel is band-pass and oriented (Gabor-like filter). In this context, CMod can therefore be considered as a stable alternative to RMax. To enforce this property, we constrain the convolution kernels to adopt such a Gabor-like structure. The corresponding architecture is called mathematical twin, because it employs a well-defined mathematical operator to mimic the behavior of the original, freely-trained model. Our approach achieves superior accuracy on ImageNet and CIFAR-10 classification tasks, compared to prior methods based on low-pass filtering. Arguably, our approach's emphasis on retaining high-frequency details contributes to a better balance between shift invariance and information preservation, resulting in improved performance. Furthermore, it has a lower computational cost and memory footprint than concurrent work, making it a promising solution for practical implementation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源