可解释的针对预训练的代码模型的AI：他们学到了什么？当他们不工作的时候？

论文标题

可解释的针对预训练的代码模型的AI：他们学到了什么？当他们不工作的时候？

Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work?

论文作者

Mohammadkhani, Ahmad Haji, Tantithamthavorn, Chakkrit, Hemmati, Hadi

论文摘要

近年来，人们对设计基于神经网络的深度网络模型产生了广泛的兴趣，该模型可以在源代码（例如代码文档生成，代码搜索和程序维修）上自动化下游软件工程任务。尽管这些研究的主要目的是提高下游任务的有效性，但许多研究仅试图采用下一个最佳的神经网络模型，而没有对特定解决方案为何有效或在特定的任务或场景上进行适当的深入分析。在本文中，使用一个可解释的AI（XAI）方法（注意机制）的示例，我们在一组软件工程下游任务上研究了两个最近的大型语言模型（LLMS）（Codebert和GraphCodebert）：代码文档（CDG），代码生成（CDG），代码改进（CR）和代码翻译（CT）。通过定量和定性研究，我们确定了Codebert和GraphCodebert学习的内容（就源代码令牌类型而言，对这些任务的最高关注）。当该模型无法正常工作时（即使在简单问题上表现差），我们还显示了一些常见模式，并提出可能减轻观察到的挑战的建议。

In recent years, there has been a wide interest in designing deep neural network-based models that automate downstream software engineering tasks on source code, such as code document generation, code search, and program repair. Although the main objective of these studies is to improve the effectiveness of the downstream task, many studies only attempt to employ the next best neural network model, without a proper in-depth analysis of why a particular solution works or does not, on particular tasks or scenarios. In this paper, using an example eXplainable AI (XAI) method (attention mechanism), we study two recent large language models (LLMs) for code (CodeBERT and GraphCodeBERT) on a set of software engineering downstream tasks: code document generation (CDG), code refinement (CR), and code translation (CT). Through quantitative and qualitative studies, we identify what CodeBERT and GraphCodeBERT learn (put the highest attention on, in terms of source code token types), on these tasks. We also show some of the common patterns when the model does not work as expected (performs poorly even on easy problems) and suggest recommendations that may alleviate the observed challenges.

下载PDF全文

下载文献需遵守相关版权规定

论文标题