了解推荐模型的规模法律

论文标题

了解推荐模型的规模法律

Understanding Scaling Laws for Recommendation Models

论文作者

Ardalani, Newsha, Wu, Carole-Jean, Chen, Zeliang, Bhushanam, Bhargav, Aziz, Adnan

论文摘要

规模一直是提高机器学习绩效的主要驱动力，了解规模定律对于可持续模型质量绩效增长，长期资源计划和开发有效的系统基础架构以支持大规模模型的战略规划至关重要。在本文中，我们研究了DLRM样式建议模型的经验缩放定律，特别是点击率（CTR）。我们观察到具有功率定律的模型质量尺度以及用于训练的模型大小，数据大小和计算量的恒定。我们通过比较沿这些轴的不同缩放方案来表征沿三个不同资源维度的缩放效率，即数据，参数和计算。我们表明，对于正在研究的模型体系结构，参数缩放量不超出Steam，并且直到出现较高的模型体系结构之前，数据缩放是前进的路径。本研究提出的关键研究问题包括：建议模型规模是否可持续地按照规模定律预测？还是我们远离规模定律的预测？扩展的限制是什么？扩展法对长期硬件/系统开发的含义是什么？

Scale has been a major driving force in improving machine learning performance, and understanding scaling laws is essential for strategic planning for a sustainable model quality performance growth, long-term resource planning and developing efficient system infrastructures to support large-scale models. In this paper, we study empirical scaling laws for DLRM style recommendation models, in particular Click-Through Rate (CTR). We observe that model quality scales with power law plus constant in model size, data size and amount of compute used for training. We characterize scaling efficiency along three different resource dimensions, namely data, parameters and compute by comparing the different scaling schemes along these axes. We show that parameter scaling is out of steam for the model architecture under study, and until a higher-performing model architecture emerges, data scaling is the path forward. The key research questions addressed by this study include: Does a recommendation model scale sustainably as predicted by the scaling laws? Or are we far off from the scaling law predictions? What are the limits of scaling? What are the implications of the scaling laws on long-term hardware/system development?

下载PDF全文

下载文献需遵守相关版权规定

论文标题