结构调节文本表示学习，以实现有效的知识图完成

论文标题

结构调节文本表示学习，以实现有效的知识图完成

Structure-Augmented Text Representation Learning for Efficient Knowledge Graph Completion

论文作者

Wang, Bo, Shen, Tao, Long, Guodong, Zhou, Tianyi, Chang, Yi

论文摘要

人类策划的知识图为各种自然语言处理任务提供了关键的支持信息，但是这些图通常是不完整的，敦促它们自动完成它们。普遍的图形嵌入方法，例如transe，通过将图元素表示为致密的嵌入并捕获其三级关系与空间距离。但是，它们几乎无法推广到从未在训练中参观过的要素，并且本质上容易受到图形不完整的影响。相比之下，文本编码方法，例如KG-Bert，诉诸于Tragral Triple的文本和三级上下文化表示。它们足够普遍，并且对不完整的不完整，尤其是与预训练的编码器相结合时。但是两个主要缺点限制了性能：（1）由于推理所有可能的三元组的昂贵评分，以及（2）文本编码器中缺乏结构化知识，高间接费用。在本文中，我们遵循文本编码范式，旨在通过使用图形嵌入技术来增强其缺点，这是两种范式的互补混合物。具体而言，我们将每个三重三倍分为两个不对称零件，如基于翻译的图嵌入方法中的两个不对称零件，并通过暹罗式的文本编码器将两个部分编码为上下文化表示。基于表示形式，我们的模型分别采用确定性的分类器和空间测量来进行表示和结构学习。此外，我们开发了一种自适应集合方案，以通过合并现有图嵌入模型的三分分数来进一步提高性能。在实验中，我们在三个基准和一个用于链接预测的零弹数据数据集上实现了最先进的性能，与文本编码方法相比，推理成本的亮点降低了1-2个数量级。

Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks, but these graphs are usually incomplete, urging auto-completion of them. Prevalent graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings and capturing their triple-level relationship with spatial distance. However, they are hardly generalizable to the elements never visited in training and are intrinsically vulnerable to graph incompleteness. In contrast, textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations. They are generalizable enough and robust to the incompleteness, especially when coupled with pre-trained encoders. But two major drawbacks limit the performance: (1) high overheads due to the costly scoring of all possible triples in inference, and (2) a lack of structured knowledge in the textual encoder. In this paper, we follow the textual encoding paradigm and aim to alleviate its drawbacks by augmenting it with graph embedding techniques -- a complementary hybrid of both paradigms. Specifically, we partition each triple into two asymmetric parts as in translation-based graph embedding approach, and encode both parts into contextualized representations by a Siamese-style textual encoder. Built upon the representations, our model employs both deterministic classifier and spatial measurement for representation and structure learning respectively. Moreover, we develop a self-adaptive ensemble scheme to further improve the performance by incorporating triple scores from an existing graph embedding model. In experiments, we achieve state-of-the-art performance on three benchmarks and a zero-shot dataset for link prediction, with highlights of inference costs reduced by 1-2 orders of magnitude compared to a textual encoding method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题