论文标题
针对黑盒对抗攻击的边界防御
Boundary Defense Against Black-box Adversarial Attacks
论文作者
论文摘要
黑盒对抗攻击通过使用重复查询通过迭代优化生成对抗样本。捍卫深层神经网络免受此类攻击的挑战。在本文中,我们提出了一种有效的边界防御方法(BD)方法,该方法通过利用对抗性优化通常需要在分类边界上需要样品的事实来减轻黑框攻击。我们的方法将边界样本视为具有低分类置信度的边界样本,并在其逻辑中添加白色高斯噪声。理论上分析了该方法对深网分类精度的影响。进行了广泛的实验,结果表明,BD方法可以可靠地防御软标签和硬标签的黑盒攻击。它的表现优于现有防御方法的列表。对于ImageNet模型,通过将零均值的白色高斯噪声添加为标准偏差为0.1时,当分类置信度小于0.3时,逻辑将降低攻击成功率将近0,同时将分类准确性降解限制为约1%。
Black-box adversarial attacks generate adversarial samples via iterative optimizations using repeated queries. Defending deep neural networks against such attacks has been challenging. In this paper, we propose an efficient Boundary Defense (BD) method which mitigates black-box attacks by exploiting the fact that the adversarial optimizations often need samples on the classification boundary. Our method detects the boundary samples as those with low classification confidence and adds white Gaussian noise to their logits. The method's impact on the deep network's classification accuracy is analyzed theoretically. Extensive experiments are conducted and the results show that the BD method can reliably defend against both soft and hard label black-box attacks. It outperforms a list of existing defense methods. For IMAGENET models, by adding zero-mean white Gaussian noise with standard deviation 0.1 to logits when the classification confidence is less than 0.3, the defense reduces the attack success rate to almost 0 while limiting the classification accuracy degradation to around 1 percent.