RAID：神经网络的随机对抗输入检测

论文标题

RAID：神经网络的随机对抗输入检测

RAID: Randomized Adversarial-Input Detection for Neural Networks

论文作者

Eniser, Hasan Ferit, Christakis, Maria, Wüstholz, Valentin

论文摘要

近年来，神经网络已成为图像分类和许多其他学习任务的默认选择，即使它们容易受到所谓对抗性攻击的影响。为了提高对这些攻击的鲁棒性，出现了许多旨在自动确定输入是否对抗的检测机制。但是，最先进的检测机制要么依赖于每种攻击的调整，要么不会在不同的攻击类型上概括。为了减轻这些问题，我们提出了一种针对对抗图像检测的新技术，RAID训练二级分类器，以识别良性和对抗性输入之间神经元激活值的差异。当对六次流行攻击进行评估时，我们的技术比最清晰的技术更可靠，更有效。此外，突袭的直接扩展可以提高其抵抗探测意见的对手的鲁棒性，而不会影响其有效性。

In recent years, neural networks have become the default choice for image classification and many other learning tasks, even though they are vulnerable to so-called adversarial attacks. To increase their robustness against these attacks, there have emerged numerous detection mechanisms that aim to automatically determine if an input is adversarial. However, state-of-the-art detection mechanisms either rely on being tuned for each type of attack, or they do not generalize across different attack types. To alleviate these issues, we propose a novel technique for adversarial-image detection, RAID, that trains a secondary classifier to identify differences in neuron activation values between benign and adversarial inputs. Our technique is both more reliable and more effective than the state of the art when evaluated against six popular attacks. Moreover, a straightforward extension of RAID increases its robustness against detection-aware adversaries without affecting its effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题