更强大，更快的瓦斯尔斯坦对抗攻击

论文标题

更强大，更快的瓦斯尔斯坦对抗攻击

Stronger and Faster Wasserstein Adversarial Attacks

论文作者

Wu, Kaiwen, Wang, Allen Houze, Yu, Yaoliang

论文摘要

深层模型虽然非常灵活和准确，但令人惊讶地容易受到称为对抗性攻击的“小，不可察觉”的扰动。虽然大多数现有攻击都集中在$ \ ell_p $ metric下测量扰动，但要考虑到像素空间中的几何形状的Wasserstein Distance长期以来一直在众所周知，它是用于衡量图像质量的合适度量，并且最近作为$ \ ell_p $ Metric in the Ell_p $ Metric in the Eversrial Attackarial Attressarial Attackarial Attressrial攻击。但是，在Wasserstein指标下构建有效的攻击在计算上更具挑战性，并且需要更好的优化算法。我们通过两种方式解决了这一差距：（a）我们开发了一个精确但有效的投影操作员，以实现更强大的投影梯度攻击；（b）我们表明，配备合适的线性最小化甲骨文的弗兰克 - 沃尔夫方法在瓦斯坦构成的限制下工作非常快。我们的算法不仅会更快地收敛，而且会产生更强的攻击。例如，我们将剩余网络的剩余网络的准确性降低至$ 3.4 \％$ $ $ $ $ $ $ 0.005 $ $ 0.005 $，相反，使用先前的Wasserstein攻击，基于\ emph {近似}投影仪的$ 65.6 \％$ $。此外，在对抗训练中采用我们更强大的攻击可显着提高受对抗训练的模型的鲁棒性。

Deep models, while being extremely flexible and accurate, are surprisingly vulnerable to "small, imperceptible" perturbations known as adversarial attacks. While the majority of existing attacks focus on measuring perturbations under the $\ell_p$ metric, Wasserstein distance, which takes geometry in pixel space into account, has long been known to be a suitable metric for measuring image quality and has recently risen as a compelling alternative to the $\ell_p$ metric in adversarial attacks. However, constructing an effective attack under the Wasserstein metric is computationally much more challenging and calls for better optimization algorithms. We address this gap in two ways: (a) we develop an exact yet efficient projection operator to enable a stronger projected gradient attack; (b) we show that the Frank-Wolfe method equipped with a suitable linear minimization oracle works extremely fast under Wasserstein constraints. Our algorithms not only converge faster but also generate much stronger attacks. For instance, we decrease the accuracy of a residual network on CIFAR-10 to $3.4\%$ within a Wasserstein perturbation ball of radius $0.005$, in contrast to $65.6\%$ using the previous Wasserstein attack based on an \emph{approximate} projection operator. Furthermore, employing our stronger attacks in adversarial training significantly improves the robustness of adversarially trained models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题