Primesrl-eval：一种用于语义角色标签系统评估的实用质量指标

论文标题

Primesrl-eval：一种用于语义角色标签系统评估的实用质量指标

PriMeSRL-Eval: A Practical Quality Metric for Semantic Role Labeling Systems Evaluation

论文作者

Jindal, Ishan, Rademaker, Alexandre, Tran, Khoi-Nguyen, Zhu, Huaiyu, Kanayama, Hiroshi, Danilevsky, Marina, Li, Yunyao

论文摘要

语义角色标签（SRL）识别句子中的谓词题目结构。此任务通常是通过四个步骤完成的：谓词识别，谓词意义上的歧义，参数识别和参数分类。一个步骤引入的错误传播到了以后的步骤。不幸的是，现有的SRL评估脚本并未考虑此错误传播方面的全部效果。他们要么评估独立于谓词意义的参数（CONLL09），要么根本不评估谓词意义（Conll05），从而在参数分类任务上产生了不准确的SRL模型性能。在本文中，我们解决了现有评估脚本的关键实际问题，并提出了更严格的SRL评估指标。我们观察到，通过使用PrimeRL，所有SOTA SRL模型的质量评估都大大下降，它们的相对排名也会发生变化。我们还表明，Primesrlsuccesscess对SOTA SRL模型中的实际失败进行了惩罚。

Semantic role labeling (SRL) identifies the predicate-argument structure in a sentence. This task is usually accomplished in four steps: predicate identification, predicate sense disambiguation, argument identification, and argument classification. Errors introduced at one step propagate to later steps. Unfortunately, the existing SRL evaluation scripts do not consider the full effect of this error propagation aspect. They either evaluate arguments independent of predicate sense (CoNLL09) or do not evaluate predicate sense at all (CoNLL05), yielding an inaccurate SRL model performance on the argument classification task. In this paper, we address key practical issues with existing evaluation scripts and propose a more strict SRL evaluation metric PriMeSRL. We observe that by employing PriMeSRL, the quality evaluation of all SoTA SRL models drops significantly, and their relative rankings also change. We also show that PriMeSRLsuccessfully penalizes actual failures in SoTA SRL models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题