论文标题
间歇性拉力与当地赔偿,以获得沟通效率的联合学习
Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning
论文作者
论文摘要
联合学习是一种强大的机器学习范式,可以合作地培训具有高度分布数据的全球模型。关于大规模联合学习的分布式随机梯度下降(SGD)算法的性能的主要瓶颈是推动本地梯度并提取全球模型的交流开销。在本文中,为了降低联邦学习的沟通复杂性,提出了一种名为“局部补偿”(PRLC)的新型方法。具体而言,每个培训节点间歇性地从SGD迭代中从服务器中拉出全局模型,从而导致其有时与服务器不同步。在这种情况下,它将使用其本地更新来补偿本地模型与全球模型之间的差距。我们对PRLC的严格理论分析实现了两个重要发现。首先,我们证明,PRLC的收敛速率与经典同步SGD保持相同的顺序,对于强率和非凸案例,由于线性加速相对于训练节点的数量,因此具有良好的可伸缩性。其次,我们表明PRLC的拉力频率低于现有的降低方法而没有局部补偿。我们还对各种机器学习模型进行了广泛的实验,以验证我们的理论结果。实验结果表明,我们的方法可实现对最新方法的显着降低,例如,PRLC仅需要一半的滞后操作。
Federated Learning is a powerful machine learning paradigm to cooperatively train a global model with highly distributed data. A major bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) algorithm for large-scale Federated Learning is the communication overhead on pushing local gradients and pulling global model. In this paper, to reduce the communication complexity of Federated Learning, a novel approach named Pulling Reduction with Local Compensation (PRLC) is proposed. Specifically, each training node intermittently pulls the global model from the server in SGD iterations, resulting in that it is sometimes unsynchronized with the server. In such a case, it will use its local update to compensate the gap between the local model and the global model. Our rigorous theoretical analysis of PRLC achieves two important findings. First, we prove that the convergence rate of PRLC preserves the same order as the classical synchronous SGD for both strongly-convex and non-convex cases with good scalability due to the linear speedup with respect to the number of training nodes. Second, we show that PRLC admits lower pulling frequency than the existing pulling reduction method without local compensation. We also conduct extensive experiments on various machine learning models to validate our theoretical results. Experimental results show that our approach achieves a significant pulling reduction over the state-of-the-art methods, e.g., PRLC requiring only half of the pulling operations of LAG.