评估具有对抗性扰动的深层模型的视觉解释的可靠性

论文标题

评估具有对抗性扰动的深层模型的视觉解释的可靠性

Assessing the Reliability of Visual Explanations of Deep Models with Adversarial Perturbations

论文作者

Valle, Dan, Pimentel, Tiago, Veloso, Adriano

论文摘要

对计算机视觉应用的复杂深度神经网络的兴趣正在增加。这导致需要提高这些模型的可解释能力。最近的解释方法介绍了来自输入图像的像素相关性的可视化，从而可以直接解释导致特定输出的输入的属性。这些方法产生了像素重要性的地图，通常通过视觉检查对其进行评估。这意味着根据人类期望而不是实际特征重要性评估解释方法的有效性。因此，在这项工作中，我们提出了一个客观措施，以评估深层模型解释的可靠性。具体而言，我们的方法基于网络结果的变化，这是由于输入图像以对抗性方式扰动而产生的。我们使用我们提出的方法进行了比较，在广为人知的解释方法之间进行了比较。最后，我们还提出了清洁相关图的方法的直接应用，创建了更多可解释的地图，而不会在基本解释中损失任何损失（按照我们提出的措施）。

The interest in complex deep neural networks for computer vision applications is increasing. This leads to the need for improving the interpretable capabilities of these models. Recent explanation methods present visualizations of the relevance of pixels from input images, thus enabling the direct interpretation of properties of the input that lead to a specific output. These methods produce maps of pixel importance, which are commonly evaluated by visual inspection. This means that the effectiveness of an explanation method is assessed based on human expectation instead of actual feature importance. Thus, in this work we propose an objective measure to evaluate the reliability of explanations of deep models. Specifically, our approach is based on changes in the network's outcome resulting from the perturbation of input images in an adversarial way. We present a comparison between widely-known explanation methods using our proposed approach. Finally, we also propose a straightforward application of our approach to clean relevance maps, creating more interpretable maps without any loss in essential explanation (as per our proposed measure).

下载PDF全文

下载文献需遵守相关版权规定

论文标题