论文标题
对抗文本归一化
Adversarial Text Normalization
论文作者
论文摘要
基于文本的对抗性攻击变得越来越普遍,通用互联网用户可以访问。随着这些攻击的繁殖,解决模型鲁棒性中差距的需求即将变得迫在眉睫。在对抗数据上进行重新培训可能会提高性能,但这些模型仍有一类其他角色级攻击。此外,重新培训模型的过程是时间和资源密集型,因此需要轻巧,可重复使用的防御。在这项工作中,我们提出了对抗性文本归一化器,这是一种新颖的方法,可恢复具有低计算开销的攻击内容上的基线性能。我们评估了标准级化合物对容易发生攻击的两个问题领域的功效,即仇恨言论和自然语言推断。我们发现,文本归一化提供了针对角色级攻击的任务不足的防御,该攻击可以对对抗性再培训解决方案进行补充,这更适合语义改变。
Text-based adversarial attacks are becoming more commonplace and accessible to general internet users. As these attacks proliferate, the need to address the gap in model robustness becomes imminent. While retraining on adversarial data may increase performance, there remains an additional class of character-level attacks on which these models falter. Additionally, the process to retrain a model is time and resource intensive, creating a need for a lightweight, reusable defense. In this work, we propose the Adversarial Text Normalizer, a novel method that restores baseline performance on attacked content with low computational overhead. We evaluate the efficacy of the normalizer on two problem areas prone to adversarial attacks, i.e. Hate Speech and Natural Language Inference. We find that text normalization provides a task-agnostic defense against character-level attacks that can be implemented supplementary to adversarial retraining solutions, which are more suited for semantic alterations.