在具有标准参数化的神经网络的无限宽度极限上

论文标题

在具有标准参数化的神经网络的无限宽度极限上

On the infinite width limit of neural networks with a standard parameterization

论文作者

Sohl-Dickstein, Jascha, Novak, Roman, Schoenholz, Samuel S., Lee, Jaehoon

论文摘要

当前有两个参数化用于得出与无限宽度神经网络相对应的固定核，NTK（神经切线核）参数化和幼稚的标准参数化。但是，这两个参数化向无限宽度的外推是有问题的。标准参数化导致了不同的神经切线内核，而NTK参数化未能捕获有限宽度网络的关键方面，例如：训练动力学对相对层宽度的依赖性，权重和偏见的相对训练动力学以及整体学习率量表。在这里，我们提出了改进的标准参数化外推，该推出将所有这些特性保留为宽度，以将宽度带到无穷大并产生明确定义的神经切线内核。我们在实验上表明，所产生的内核通常具有与NTK参数化产生的内核相似的准确性，但与典型有限宽度网络的参数化相对应更好。此外，通过仔细调整宽度参数，改进的标准参数化内核可以优于源自NTK参数化的内核。我们发布了该改进的标准参数化，作为https://github.com/google/neural-tangents的神经切线库的一部分。

There are currently two parameterizations used to derive fixed kernels corresponding to infinite width neural networks, the NTK (Neural Tangent Kernel) parameterization and the naive standard parameterization. However, the extrapolation of both of these parameterizations to infinite width is problematic. The standard parameterization leads to a divergent neural tangent kernel while the NTK parameterization fails to capture crucial aspects of finite width networks such as: the dependence of training dynamics on relative layer widths, the relative training dynamics of weights and biases, and overall learning rate scale. Here we propose an improved extrapolation of the standard parameterization that preserves all of these properties as width is taken to infinity and yields a well-defined neural tangent kernel. We show experimentally that the resulting kernels typically achieve similar accuracy to those resulting from an NTK parameterization, but with better correspondence to the parameterization of typical finite width networks. Additionally, with careful tuning of width parameters, the improved standard parameterization kernels can outperform those stemming from an NTK parameterization. We release code implementing this improved standard parameterization as part of the Neural Tangents library at https://github.com/google/neural-tangents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题