论文标题
部分可观测时空混沌系统的无模型预测
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers
论文作者
论文摘要
培训非常深的神经网络仍然是一项极具挑战性的任务。常见的解决方案是使用快捷连接和归一化层,这些连接和标准化层都是流行的Resnet体系结构中的至关重要成分。但是,有强有力的证据表明,重置的行为更像是浅网络的合奏,而不是真正的深层网络。最近,可以通过将某些转换应用于其激活功能来训练,深层香草网络(即没有标准化层或快捷连接的网络)可以像重新NET一样快地训练。但是,这种方法(称为深内核塑形)与保留并不完全兼容,并且产生的网络比成像网上的重新连接大得多。在这项工作中,我们通过开发一种完全兼容的新型转换类型来纠正这种情况,该转换与一种泄漏的变体 - 泄漏的保留。我们在实验中表明,我们的方法引入了可忽略不计的额外计算成本,它具有具有深层的香草网络的验证精度,这些网络具有具有重置(相同宽度/深度)的竞争力,并且比用混乱(EOC)方法获得的方法明显高。与EOC不同,我们获得的验证精度不会随着深度而变得更糟。
Training very deep neural networks is still an extremely challenging task. The common solution is to use shortcut connections and normalization layers, which are both crucial ingredients in the popular ResNet architecture. However, there is strong evidence to suggest that ResNets behave more like ensembles of shallower networks than truly deep ones. Recently, it was shown that deep vanilla networks (i.e. networks without normalization layers or shortcut connections) can be trained as fast as ResNets by applying certain transformations to their activation functions. However, this method (called Deep Kernel Shaping) isn't fully compatible with ReLUs, and produces networks that overfit significantly more than ResNets on ImageNet. In this work, we rectify this situation by developing a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs. We show in experiments that our method, which introduces negligible extra computational cost, achieves validation accuracies with deep vanilla networks that are competitive with ResNets (of the same width/depth), and significantly higher than those obtained with the Edge of Chaos (EOC) method. And unlike with EOC, the validation accuracies we obtain do not get worse with depth.