论文标题
GACT:通用网络体系结构的激活压缩培训
GACT: Activation Compressed Training for Generic Network Architectures
论文作者
论文摘要
训练大型神经网络(NN)模型需要广泛的记忆资源,而激活压缩训练(ACT)是减少训练记忆足迹的一种有前途的方法。本文介绍了GACT,这是一个ACT框架,旨在支持具有有限域知识的通用NN体系结构的广泛机器学习任务。通过分析ACT的近似梯度的线性化版本,我们证明了GACT在没有有关操作员类型或模型体系结构的先验知识的情况下的收敛性。为了使训练保持稳定,我们提出了一种算法,该算法通过估计运行时对梯度的影响来决定每个张量的压缩比。我们将GACT实现为易于适用于任何NN体系结构的Pytorch库。 GACT将卷积NNS,变压器和图形NNS的激活记忆降低了8.1倍,从而使4.2倍至24.7倍的训练能够较大,而精度损失却可以忽略不计。我们在https://github.com/liuxiaoxuanpku/gact-icml上实现GACT作为Pytorch库。
Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint. This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. By analyzing a linearized version of ACT's approximate gradient, we prove the convergence of GACT without prior knowledge on operator type or model architecture. To make training stable, we propose an algorithm that decides the compression ratio for each tensor by estimating its impact on the gradient at run time. We implement GACT as a PyTorch library that readily applies to any NN architecture. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size, with negligible accuracy loss. We implement GACT as a PyTorch library at https://github.com/LiuXiaoxuanPKU/GACT-ICML.