假EDEDGE：减轻链接预测中的数据集转移

论文标题

假EDEDGE：减轻链接预测中的数据集转移

FakeEdge: Alleviate Dataset Shift in Link Prediction

论文作者

Dong, Kaiwen, Tian, Yijun, Guo, Zhichun, Yang, Yang, Chawla, Nitesh V.

论文摘要

链接预测是图形结构化数据中的关键问题。由于图神经网络（GNN）最近取得了成功，因此提出了各种基于GNN的模型来应对链接预测任务。具体而言，GNNS利用传递范式的消息获得节点表示，这取决于链接连接。但是，在链接预测任务中，培训集中的链接始终存在，而测试集中的链接尚未形成，从而导致了学习表示表示的连接模式和偏见的差异。它导致数据集偏移问题，从而降低了模型性能。在本文中，我们首先确定链接预测任务中的数据集偏移问题，并提供有关现有链接预测方法如何容易受到影响的理论分析。然后，我们提出了一种模型不足的技术FakeEdge，以通过减轻训练和测试集之间的图形拓扑差距来解决该问题。广泛的实验证明了假Edge在多个域上的多个数据集上的适用性和优势。

Link prediction is a crucial problem in graph-structured data. Due to the recent success of graph neural networks (GNNs), a variety of GNN-based models were proposed to tackle the link prediction task. Specifically, GNNs leverage the message passing paradigm to obtain node representation, which relies on link connectivity. However, in a link prediction task, links in the training set are always present while ones in the testing set are not yet formed, resulting in a discrepancy of the connectivity pattern and bias of the learned representation. It leads to a problem of dataset shift which degrades the model performance. In this paper, we first identify the dataset shift problem in the link prediction task and provide theoretical analyses on how existing link prediction methods are vulnerable to it. We then propose FakeEdge, a model-agnostic technique, to address the problem by mitigating the graph topological gap between training and testing sets. Extensive experiments demonstrate the applicability and superiority of FakeEdge on multiple datasets across various domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题