随机延迟的异步分布式优化

论文标题

随机延迟的异步分布式优化

Asynchronous Distributed Optimization with Stochastic Delays

论文作者

Glasgow, Margalit, Wootters, Mary

论文摘要

我们研究了具有中央参数服务器的分布式数据设置中的异步有限和最小化。尽管在所有计算机都可以访问数据的并行设置中，异步却充分理解了 - 例如，较低的降级梯度算法（如Saga）的修改良好 - 以分布式数据设置众所周知。我们为分布式数据设置开发了基于传奇的算法ADSAGA，其中数据在许多机器之间分配。 We show that with $m$ machines, under a natural stochastic delay model with an mean delay of $m$, ADSAGA converges in $\tilde{O}\left(\left(n + \sqrt{m}κ\right)\log(1/ε)\right)$ iterations, where $n$ is the number of component functions, and $κ$ is a condition number.这种复杂性位于复杂性$ \ tilde {o} \ left（\ left（n +κ\ right）\ log（1/ε）\ right）$ textit {nove delays}的$ textit}和复杂性$ \ tilde $ \ tilde {o} {o} \ left（\ weft（\ weft）（延迟为\ textit {nutary}的异步算法（但由$ o（m）$）限制，并且所有人都可以访问数据。现有的具有分布式数据设置和任意延迟的异步算法仅显示出在$ \ tilde {o}（n^2κ\ log（1/ε））$迭代中收敛。我们从最小二乘上进行经验比较了ADSAGA与现有的平行和分布式算法（包括同步Minibatch算法）的迭代复杂性和壁挂性能。我们的结果表明，降低方差的异步方法比SGD或同步方法的壁挂优势。

We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines -- e.g., modifications of variance-reduced gradient algorithms like SAGA work well -- little is known for the distributed-data setting. We develop an algorithm ADSAGA based on SAGA for the distributed-data setting, in which the data is partitioned between many machines. We show that with $m$ machines, under a natural stochastic delay model with an mean delay of $m$, ADSAGA converges in $\tilde{O}\left(\left(n + \sqrt{m}κ\right)\log(1/ε)\right)$ iterations, where $n$ is the number of component functions, and $κ$ is a condition number. This complexity sits squarely between the complexity $\tilde{O}\left(\left(n + κ\right)\log(1/ε)\right)$ of SAGA \textit{without delays} and the complexity $\tilde{O}\left(\left(n + mκ\right)\log(1/ε)\right)$ of parallel asynchronous algorithms where the delays are \textit{arbitrary} (but bounded by $O(m)$), and the data is accessible by all. Existing asynchronous algorithms with distributed-data setting and arbitrary delays have only been shown to converge in $\tilde{O}(n^2κ\log(1/ε))$ iterations. We empirically compare on least-squares problems the iteration complexity and wallclock performance of ADSAGA to existing parallel and distributed algorithms, including synchronous minibatch algorithms. Our results demonstrate the wallclock advantage of variance-reduced asynchronous approaches over SGD or synchronous approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题