Trustgan：通过生成对抗网络培训安全且值得信赖的深度学习模型

论文标题

Trustgan：通过生成对抗网络培训安全且值得信赖的深度学习模型

TrustGAN: Training safe and trustworthy deep learning models through generative adversarial networks

论文作者

Bourboux, Hélion du Mas des

论文摘要

已经为各种任务开发了深度学习模型，并每天部署在实际条件下工作。其中一些任务至关重要，并且需要信任和安全，例如军事通讯或癌症诊断。这些模型得到了真实的数据，模拟数据或两者的组合，并经过训练以对它们具有高度预测性。但是，收集足够的真实数据或模拟它们代表所有真实条件的是：昂贵，有时由于机密性而不可能，并且大多数时候不可能。确实，实际条件在不断变化，有时是棘手的。一种解决方案是部署机器学习模型，这些模型在足够自信的情况下可以提供预测，否则会提出旗帜或悬架。一个问题是，标准模型很容易失败，无法检测其预测不可靠的分发样本。我们在这里提出Trustgan，这是一种针对信任的生成对抗网络管道。这是一条深度学习的管道，可改善目标模型对置信度的估计，而不会影响其预测能力。管道可以接受任何给定的深度学习模型，该模型会输出预测和对该预测的信心。此外，管道无需修改该目标模型。因此，它可以轻松地部署在MLOP（机器学习操作）设置中。此处将管道应用于在MNIST数据上训练的目标分类模型，以根据图像识别数字。通过标准方式和Trustgan进行培训时，我们将这种模型进行比较。我们表明，在分发样本（在这里时尚人士和CIFAR10）中，估计的信心大大降低了。我们观察到在Augmod的1D无线电信号训练的分类模型，在RML2016.04C测试中。我们还公开发布代码。

Deep learning models have been developed for a variety of tasks and are deployed every day to work in real conditions. Some of these tasks are critical and models need to be trusted and safe, e.g. military communications or cancer diagnosis. These models are given real data, simulated data or combination of both and are trained to be highly predictive on them. However, gathering enough real data or simulating them to be representative of all the real conditions is: costly, sometimes impossible due to confidentiality and most of the time impossible. Indeed, real conditions are constantly changing and sometimes are intractable. A solution is to deploy machine learning models that are able to give predictions when they are confident enough otherwise raise a flag or abstain. One issue is that standard models easily fail at detecting out-of-distribution samples where their predictions are unreliable. We present here TrustGAN, a generative adversarial network pipeline targeting trustness. It is a deep learning pipeline which improves a target model estimation of the confidence without impacting its predictive power. The pipeline can accept any given deep learning model which outputs a prediction and a confidence on this prediction. Moreover, the pipeline does not need to modify this target model. It can thus be easily deployed in a MLOps (Machine Learning Operations) setting. The pipeline is applied here to a target classification model trained on MNIST data to recognise numbers based on images. We compare such a model when trained in the standard way and with TrustGAN. We show that on out-of-distribution samples, here FashionMNIST and CIFAR10, the estimated confidence is largely reduced. We observe similar conclusions for a classification model trained on 1D radio signals from AugMod, tested on RML2016.04C. We also publicly release the code.

下载PDF全文

下载文献需遵守相关版权规定

论文标题