Guir在Semeval-2020任务12：域调整的反复语言检测模型

论文标题

Guir在Semeval-2020任务12：域调整的反复语言检测模型

GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection

论文作者

Sotudeh, Sajad, Xiang, Tong, Yao, Hao-Ren, MacAvaney, Sean, Yang, Eugene, Goharian, Nazli, Frieder, Ophir

论文摘要

进攻性语言检测是自然语言处理中的一项重要且具有挑战性的任务。我们将提交给《犯罪2020年共享任务》，其中包括三个英语子任务：识别进攻性语言的存在（子任务A），确定目标在进攻性语言（子任务B）中的存在，并识别目标类别（子任务C）。我们的实验使用域调整的上下文化语言模型（即BERT）探索此任务。我们还试验了不同组件和配置（例如，多视图SVM），该组件堆叠在特定子任务的BERT模型上。我们的提交在子任务A中达到91.7％，子任务B中的66.5％和子任务C中的63.2％。我们进行了一项消融研究，这表明域调整可大大提高分类性能。此外，错误分析表明，我们的模型犯了常见的错误分类错误，并概述了未来的研究方向。

Offensive language detection is an important and challenging task in natural language processing. We present our submissions to the OffensEval 2020 shared task, which includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C). Our experiments explore using a domain-tuned contextualized language model (namely, BERT) for this task. We also experiment with different components and configurations (e.g., a multi-view SVM) stacked upon BERT models for specific sub-tasks. Our submissions achieve F1 scores of 91.7% in Sub-task A, 66.5% in Sub-task B, and 63.2% in Sub-task C. We perform an ablation study which reveals that domain tuning considerably improves the classification performance. Furthermore, error analysis shows common misclassification errors made by our model and outlines research directions for future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题