论文标题

与Openie的丰富关系提取

Enriching Relation Extraction with OpenIE

论文作者

Temperoni, Alessandro, Biryukov, Maria, Theobald, Martin

论文摘要

关系提取(RE)是信息提取(IE)的子学科(IE),重点是从自然语言输入单元(例如句子,条款,甚至是由多个句子和/或条款组成的简短段落)预测关系谓词的预测。与指定的实体识别(NER)和歧义(NED)一起,构成了许多先进的IE任务的基础,例如知识基础(KB)人群和验证。在这项工作中,我们探讨了最近的开放信息提取方法(OpenIE)如何通过编码有关句子的主要单位的结构化信息,例如主题,对象,言语短语和副词,将其分解为各种形式的vectorized(又是非结构化的句子)表示。我们的主要猜想是,通过openie将长期且可能复杂的句子分解为多个较小的子句,甚至有助于微调上下文敏感的语言模型,例如bert(及其众多变体)。与现有的RE方法相比,我们对两个带注释的Corpora(知识基因和Lighrel)进行了实验,证明了我们丰富模型的准确性。我们的最佳结果分别为知识属和少数​​人的F1得分的92%和71%,证明了我们在竞争性基准上的方法的有效性。

Relation extraction (RE) is a sub-discipline of information extraction (IE) which focuses on the prediction of a relational predicate from a natural-language input unit (such as a sentence, a clause, or even a short paragraph consisting of multiple sentences and/or clauses). Together with named-entity recognition (NER) and disambiguation (NED), RE forms the basis for many advanced IE tasks such as knowledge-base (KB) population and verification. In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE by encoding structured information about the sentences' principal units, such as subjects, objects, verbal phrases, and adverbials, into various forms of vectorized (and hence unstructured) representations of the sentences. Our main conjecture is that the decomposition of long and possibly convoluted sentences into multiple smaller clauses via OpenIE even helps to fine-tune context-sensitive language models such as BERT (and its plethora of variants) for RE. Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models compared to existing RE approaches. Our best results reach 92% and 71% of F1 score for KnowledgeNet and FewRel, respectively, proving the effectiveness of our approach on competitive benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源