通过链接的开放数据丰富Wikidata

论文标题

通过链接的开放数据丰富Wikidata

Enriching Wikidata with Linked Open Data

论文作者

Zhang, Bohui, Ilievski, Filip, Szekely, Pedro

论文摘要

大型公共知识图，例如Wikidata，包含数十亿个关于数千万实体的陈述，从而激发了各种用例以利用此类知识图。但是，实践表明，Wikidata中仍然缺少适合用户需求的许多相关信息，而当前的链接开放数据（LOD）工具不适合丰富像Wikidata这样的大图。在本文中，我们研究了用LOD云中结构化数据源丰富Wikidata的潜力。我们提供了一个新颖的工作流程，其中包括差距检测，源选择，模式对齐和语义验证。我们用两个互补的LOD来源评估了我们的富集方法：一个嘈杂的来源，具有广泛的覆盖范围，dbpedia和一个手动策划的来源，对艺术领域（Getty）的关注狭窄。我们的实验表明，我们的工作流程可以通过高质量的外部LOD来源来丰富Wikidata。财产一致性和数据质量是关键挑战，而实体的一致性和源选择是由现有的Wikidata机制良好支持的。我们提供代码和数据以支持将来的工作。

Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users' needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, we investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. We evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with a narrow focus on the art domain, Getty. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. We make our code and data available to support future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题