旋转恢复以隐式稀疏深度网络

论文标题

旋转恢复以隐式稀疏深度网络

Rotate the ReLU to implicitly sparsify deep networks

论文作者

Nayak, Nancy, Kalyani, Sheetal

论文摘要

在针对各种现实生活任务的基于深度神经网络的解决方案的时代，具有紧凑且能节能的可部署模型已经变得相当重要。大多数现有的深度体系结构都使用整流器线性单元（Relu）激活。在本文中，我们提出了一个新颖的想法，即旋转relu激活，以使建筑具有更高的自由度。我们表明，通过训练学习旋转的这种激活，从而消除了网络中这些参数/过滤器，这对任务并不重要。换句话说，旋转的Relu似乎在做隐式稀疏。旋转的Relu激活的斜率充当粗糙的特征提取器，并且可以在重新训练之前消除不必要的特征。我们的研究表明，特征总是选择通过诸如Resnet及其变体等体系结构中的过滤器数量较少。因此，通过旋转relu，自动识别的权重或不需要的过滤器，可以删除并可以删除，从而为内存和计算带来明显的节省。此外，在某些情况下，我们还注意到，除了节省内存和计算外，我们还可以改进流行数据集中相应基线工作的性能，例如MNIST，CIFAR-10，CIFAR-10，CIFAR-100和SVHN。

In the era of Deep Neural Network based solutions for a variety of real-life tasks, having a compact and energy-efficient deployable model has become fairly important. Most of the existing deep architectures use Rectifier Linear Unit (ReLU) activation. In this paper, we propose a novel idea of rotating the ReLU activation to give one more degree of freedom to the architecture. We show that this activation wherein the rotation is learned via training results in the elimination of those parameters/filters in the network which are not important for the task. In other words, rotated ReLU seems to be doing implicit sparsification. The slopes of the rotated ReLU activations act as coarse feature extractors and unnecessary features can be eliminated before retraining. Our studies indicate that features always choose to pass through a lesser number of filters in architectures such as ResNet and its variants. Hence, by rotating the ReLU, the weights or the filters that are not necessary are automatically identified and can be dropped thus giving rise to significant savings in memory and computation. Furthermore, in some cases, we also notice that along with saving in memory and computation we also obtain improvement over the reported performance of the corresponding baseline work in the popular datasets such as MNIST, CIFAR-10, CIFAR-100, and SVHN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题