与结构化方向的随机零秩序下降

论文标题

与结构化方向的随机零秩序下降

Stochastic Zeroth order Descent with Structured Directions

论文作者

Rando, Marco, Molinari, Cesare, Villa, Silvia, Rosasco, Lorenzo

论文摘要

我们介绍和分析结构化的随机零订单下降（S-SZD），这是一种有限的差异方法，在一组$ l \ leq d $正交方向上近似于随机梯度，其中$ d $是环境空间的维度。这些方向是随机选择的，并且可能在每个步骤中发生变化。对于平滑凸功能，我们证明了迭代元素的收敛性几乎可以确定$ o（（d/l）k^{ - c}）$的函数值的收敛速率，每个$ c <1/2 $都在任意接近迭代数量的随机梯度下降（SGD）。我们的界限显示了使用$ l $多个方向而不是一个方向的好处。对于满足polyak-lojasiewicz条件的非凸函数，我们在这种假设下建立了随机结构化零阶算法的第一个收敛速率。我们用数值模拟来证实我们的理论发现，在该模拟中满足了假设以及在机器学习中超参数优化的现实世界中的问题，从而实现了竞争性的实践表现。

We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach that approximates a stochastic gradient on a set of $l\leq d$ orthogonal directions, where $d$ is the dimension of the ambient space. These directions are randomly chosen and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form $O( (d/l) k^{-c})$ for every $c<1/2$, which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations. Our bound shows the benefits of using $l$ multiple directions instead of one. For non-convex functions satisfying the Polyak-Łojasiewicz condition, we establish the first convergence rates for stochastic structured zeroth order algorithms under such an assumption. We corroborate our theoretical findings with numerical simulations where the assumptions are satisfied and on the real-world problem of hyper-parameter optimization in machine learning, achieving competitive practical performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题