对抗示例检测和分析层的自动编码器

论文标题

对抗示例检测和分析层的自动编码器

Adversarial Examples Detection and Analysis with Layer-wise Autoencoders

论文作者

Wójcik, Bartosz, Morawiecki, Paweł, Śmieja, Marek, Krzyżek, Tomasz, Spurek, Przemysław, Tabor, Jacek

论文摘要

我们提出了一种基于来自目标网络的隐藏层的数据表示的对抗性示例的机制。为此，我们在目标网络的中间层进行培训单个自动编码器。这使我们能够描述真实数据的流形，并因此决定给定示例是否具有与真实数据相同的特征。它还使我们深入了解对抗性例子的行为及其在深神网络的层次上的流动。实验结果表明，我们的方法在监督和无监督的设置中优于最新技术。

We present a mechanism for detecting adversarial examples based on data representations taken from the hidden layers of the target network. For this purpose, we train individual autoencoders at intermediate layers of the target network. This allows us to describe the manifold of true data and, in consequence, decide whether a given example has the same characteristics as true data. It also gives us insight into the behavior of adversarial examples and their flow through the layers of a deep neural network. Experimental results show that our method outperforms the state of the art in supervised and unsupervised settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题