论文标题
在知识库中揭示设定信息的隐藏语义
Uncovering Hidden Semantics of Set Information in Knowledge Bases
论文作者
论文摘要
知识库(KB)包含有关实体和谓词的大量结构化信息。本文着重于设定值的谓词,即实体与一组实体之间的关系。在KBS中,此信息通常以两种格式表示:(i)通过计算谓词,例如数字和员工,存储汇总整数,以及(ii)通过枚举parenterof and worksor之类的谓词,该谓词存储单个设置成员资格。两种格式通常都是互补的:与列举谓词不同,计算谓词不会放弃个体,而是更有可能对真实集合大小的信息提供信息,因此,这种共存可以使有趣的应用程序能够回答和KB策划。 在本文中,我们旨在发现这种隐藏的知识。我们分两个步骤进行。 (i)我们通过基于统计和基于嵌入的特征从给定的KB谓词中识别出设置值的谓词。 (ii)我们通过共同出现,相关性和文本相关性指标的结合来链接计数谓词和列举谓词。我们分析了四个突出的知识库中计数信息的普遍性,并表明我们的链接方法在集合谓词识别中达到高达0.55 f1得分,而随机选择的0.40 F1得分,在相关谓词perigatiatiate Perigatiatiate Perigatiatiate Perigatiatiate Perigatiatiate Perigatiatiate Perigatiatiate Perigatiatient的位置3处的位置1和0.75在位置1和0.75的标准化折扣增长率为0.84。我们的谓词比对在https://counqer.mpi-inf.mpg.de/spo中提供的演示系统中展示。
Knowledge Bases (KBs) contain a wealth of structured information about entities and predicates. This paper focuses on set-valued predicates, i.e., the relationship between an entity and a set of entities. In KBs, this information is often represented in two formats: (i) via counting predicates such as numberOfChildren and staffSize, that store aggregated integers, and (ii) via enumerating predicates such as parentOf and worksFor, that store individual set memberships. Both formats are typically complementary: unlike enumerating predicates, counting predicates do not give away individuals, but are more likely informative towards the true set size, thus this coexistence could enable interesting applications in question answering and KB curation. In this paper we aim at uncovering this hidden knowledge. We proceed in two steps. (i) We identify set-valued predicates from a given KB predicates via statistical and embedding-based features. (ii) We link counting predicates and enumerating predicates by a combination of co-occurrence, correlation and textual relatedness metrics. We analyze the prevalence of count information in four prominent knowledge bases, and show that our linking method achieves up to 0.55 F1 score in set predicate identification versus 0.40 F1 score of a random selection, and normalized discounted gains of up to 0.84 at position 1 and 0.75 at position 3 in relevant predicate alignments. Our predicate alignments are showcased in a demonstration system available at https://counqer.mpi-inf.mpg.de/spo.