更快的随机准Newton方法

论文标题

更快的随机准Newton方法

Faster Stochastic Quasi-Newton Methods

论文作者

Zhang, Qingsong, Huang, Feihu, Deng, Cheng, Huang, Heng

论文摘要

随机优化方法已成为机器学习中的一系列流行优化工具。尤其是，由于每卷计算复杂性低，随机梯度下降（SGD）已被广泛用于机器学习问题，例如训练神经网络。实际上，利用二阶信息的牛顿或准Newton方法比一阶方法能够实现更好的解决方案。因此，已经开发了随机的准牛顿（SQN）方法，以通过利用近似二阶信息来有效地实现更好的解决方案。但是，现有的SQN方法仍未达到最著名的随机一阶甲骨文（SFO）复杂性。为了填补这一空白，我们提出了一种基于饮食降低技术的新型随机准牛顿法（蜘蛛网）。 We prove that our SpiderSQN method reaches the best known SFO complexity of $\mathcal{O}(n+n^{1/2}ε^{-2})$ in the finite-sum setting to obtain an $ε$-first-order stationary point.为了进一步提高其实际性能，我们将蜘蛛网与不同的动量方案结合在一起。此外，提出的算法被推广到在线设置，并且开发了$ \ Mathcal {O}（ε^{ - 3}）$的相应SFO复杂性，这也与现有的最佳结果相匹配。基准数据集的广泛实验表明，我们的新算法优于非凸优化的最先进方法。

Stochastic optimization methods have become a class of popular optimization tools in machine learning. Especially, stochastic gradient descent (SGD) has been widely used for machine learning problems such as training neural networks due to low per-iteration computational complexity. In fact, the Newton or quasi-newton methods leveraging second-order information are able to achieve a better solution than the first-order methods. Thus, stochastic quasi-Newton (SQN) methods have been developed to achieve the better solution efficiently than the stochastic first-order methods by utilizing approximate second-order information. However, the existing SQN methods still do not reach the best known stochastic first-order oracle (SFO) complexity. To fill this gap, we propose a novel faster stochastic quasi-Newton method (SpiderSQN) based on the variance reduced technique of SIPDER. We prove that our SpiderSQN method reaches the best known SFO complexity of $\mathcal{O}(n+n^{1/2}ε^{-2})$ in the finite-sum setting to obtain an $ε$-first-order stationary point. To further improve its practical performance, we incorporate SpiderSQN with different momentum schemes. Moreover, the proposed algorithms are generalized to the online setting, and the corresponding SFO complexity of $\mathcal{O}(ε^{-3})$ is developed, which also matches the existing best result. Extensive experiments on benchmark datasets demonstrate that our new algorithms outperform state-of-the-art approaches for nonconvex optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题