论文标题
“你是扎根的!”:预先训练的语言模型中的潜在名称伪像
"You are grounded!": Latent Name Artifacts in Pre-trained Language Models
论文作者
论文摘要
预训练的语言模型(LMS)可能会使其训练语料库中的偏见永存至下游模型。我们专注于与给定名称(例如唐纳德)相关的文物,根据语料库,它可能与特定实体相关联,如下一标记预测(例如,特朗普)所示。虽然在某些情况下有帮助,但基础也发生在不合适或不适当的情况下。例如,为“唐纳德(Donald)产生的结尾是一个与其他名称的结尾,并且通常具有比平均水平的负面情绪。我们通过阅读理解探针对下游任务的潜在影响,其中名称扰动会改变模型答案。作为一线希望,我们的实验表明,对不同语料库的额外培训可能会减轻这种偏见。
Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g., Donald), which, depending on the corpus, may be associated with specific entities, as indicated by next token prediction (e.g., Trump). While helpful in some contexts, grounding happens also in under-specified or inappropriate contexts. For example, endings generated for `Donald is a' substantially differ from those of other names, and often have more-than-average negative sentiment. We demonstrate the potential effect on downstream tasks with reading comprehension probes where name perturbation changes the model answers. As a silver lining, our experiments suggest that additional pre-training on different corpora may mitigate this bias.