论文标题
在GPT中找到和编辑事实协会
Locating and Editing Factual Associations in GPT
论文作者
论文摘要
我们分析了自回归变压器语言模型中事实关联的存储和回忆,发现这些关联对应于局部,可直接编辑的计算的证据。我们首先开发了一种因果干预措施来识别模型的事实预测中具有决定性的神经元激活。这揭示了中层进料前馈模块中的一组不同的步骤,这些模块在处理主题令牌时介导了事实预测。为了测试我们的假设,即这些计算对应于事实关联的回忆,我们修改了馈送权重,以使用排名一号模型编辑(ROME)更新特定的事实关联。我们发现罗马在标准的零摄影关系提取(ZSRE)模型编辑任务上有效,与现有方法相当。为了进行更敏感的评估,我们还在反事实断言的新数据集上评估了罗马,它同时保持了特异性和概括,而其他方法则牺牲一个或彼此。我们的结果证实了中层进料模块在存储事实关联中的重要作用,并表明直接操纵计算机制可能是模型编辑的可行方法。代码,数据集,可视化和交互式演示笔记本可在https://rome.baulab.info/上找到。
We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/