论文标题
自适应分层超梯度下降
Adaptive Hierarchical Hyper-gradient Descent
论文作者
论文摘要
在这项研究中,我们根据超级下降框架研究了不同级别的学习率适应,并提出了一种方法,该方法通过将多个级别的学习速率与层次结构相结合,可以自适应地学习优化的参数。同时,我们展示了正规化过度参数的学习率与不同级别自适应学习率的组合之间的关系。在多种网络体系结构上进行的实验,包括前馈网络,LENET-5和RESNET-18/34,表明在各种情况下,提出的多级自适应方法可以优于基线自适应方法。
In this study, we investigate learning rate adaption at different levels based on the hyper-gradient descent framework and propose a method that adaptively learns the optimizer parameters by combining multiple levels of learning rates with hierarchical structures. Meanwhile, we show the relationship between regularizing over-parameterized learning rates and building combinations of adaptive learning rates at different levels. The experiments on several network architectures, including feed-forward networks, LeNet-5 and ResNet-18/34, show that the proposed multi-level adaptive approach can outperform baseline adaptive methods in a variety of circumstances.