论文标题
学习与本地对比解释一致的全球透明模型
Learning Global Transparent Models Consistent with Local Contrastive Explanations
论文作者
论文摘要
关于黑盒模型(例如,神经网络),有一篇丰富而越来越多的文献来产生局部对比/反事实解释。 在这些方法中,对于输入,解释的形式是对比点的形式,其特征与原始输入的很少有不同,而在其他类别中则存在。其他作品尝试使用实际标签或基于Black-Box模型预测来构建基于数据的决策树和规则列表,例如决策树和规则列表。尽管这些可解释的全局模型可能很有用,但它们可能与特定选择的黑框的本地解释不一致。在这项工作中,我们探讨了一个问题:我们可以制作一个透明的全局模型,该模型同时准确且与黑框模型的局部(对比度)解释一致?我们引入了一个自然的局部一致性度量,该指标是否可以量化黑框模型的局部解释和预测,这也与代理全局透明模型一致。基于关键见解,我们提出了一种新颖的方法,在该方法中,我们从稀疏的本地对比解释中创建了自定义的布尔特征,然后对这些模型进行培训,然后训练全球透明的模型,并从经验上展示此类模型与其他已知策略相比具有较高的局部一致性,同时仍然对访问原始数据的模型仍在绩效中保持紧密的态度。
There is a rich and growing literature on producing local contrastive/counterfactual explanations for black-box models (e.g. neural networks). In these methods, for an input, an explanation is in the form of a contrast point differing in very few features from the original input and lying in a different class. Other works try to build globally interpretable models like decision trees and rule lists based on the data using actual labels or based on the black-box models predictions. Although these interpretable global models can be useful, they may not be consistent with local explanations from a specific black-box of choice. In this work, we explore the question: Can we produce a transparent global model that is simultaneously accurate and consistent with the local (contrastive) explanations of the black-box model? We introduce a natural local consistency metric that quantifies if the local explanations and predictions of the black-box model are also consistent with the proxy global transparent model. Based on a key insight we propose a novel method where we create custom boolean features from sparse local contrastive explanations of the black-box model and then train a globally transparent model on just these, and showcase empirically that such models have higher local consistency compared with other known strategies, while still being close in performance to models that are trained with access to the original data.