论文标题
在Imagenet量表上进行不同的隐私进行培训
Toward Training at ImageNet Scale with Differential Privacy
论文作者
论文摘要
差异隐私(DP)是训练机学习(包括神经网络)的事实上的标准,同时确保了培训集中各个示例的隐私。尽管有关于如何具有差异隐私训练ML模型的丰富文献,但培训具有合理精确性和隐私的现实生活,大型神经网络仍然非常具有挑战性。 我们着手使用ImageNet图像分类作为ML任务的海报示例进行研究,该示例目前非常具有挑战性,目前使用DP准确解决。本文从我们的努力中分享了最初的课程,希望它能激发并告知其他研究人员进行大规模探索DP培训。我们展示的方法可以帮助更快地使DP培训,以及在DP设置中倾向于更好地工作的培训过程的模型类型和设置。合并,我们讨论的方法让我们训练DP的RESNET-18,至$ 47.9 \%$ $准确性和隐私参数$ε= 10,δ= 10^{ - 6} $。这是对ImageNet模型的“天真” DP培训的重大改进,但与同一网络无隐私而获得的75美元\%$精度相去甚远。我们使用的模型是在Place365数据集上鉴定的,作为起点。我们在https://github.com/google-research/dp-imagenet上共享我们的代码,呼吁其他人以这一新的基线为基础,以进一步改善DP的大规模改进。
Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set out to investigate how to do this, using ImageNet image classification as a poster example of an ML task that is very challenging to resolve accurately with DP right now. This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale. We show approaches that help make DP training faster, as well as model types and settings of the training process that tend to work better in the DP setting. Combined, the methods we discuss let us train a Resnet-18 with DP to $47.9\%$ accuracy and privacy parameters $ε= 10, δ= 10^{-6}$. This is a significant improvement over "naive" DP training of ImageNet models, but a far cry from the $75\%$ accuracy that can be obtained by the same network without privacy. The model we use was pretrained on the Places365 data set as a starting point. We share our code at https://github.com/google-research/dp-imagenet, calling for others to build upon this new baseline to further improve DP at scale.