论文标题
大规模生物医学知识图中的环境
Towards context in large scale biomedical knowledge graphs
论文作者
论文摘要
对于NLP和生活科学中的知识发现,上下文信息被广泛考虑,因为它极大地影响了自然语言的确切含义。科学挑战不仅是提取此类上下文数据,还要存储此数据以进行进一步的查询和发现方法。在这里,我们建议使用基于多面体持久系统的标记属性图的多个步骤知识图方法,以利用上下文数据进行上下文挖掘,图形查询,知识发现和提取。我们为语义网络中的一般上下文概念介绍了图理论基础,并展示了基于生物医学文献和文本挖掘的概念证明。我们的测试系统包含一个从PubMed和Scaiview数据中得出的知识图,并使用BEL富含文本挖掘数据和域特定语言数据。在这里,上下文是一个比注释更一般的概念。该密集的图具有超过7100万个节点和850m的关系。我们通过图形查询代表的27种现实世界用例讨论了这种新颖方法的影响。
Contextual information is widely considered for NLP and knowledge discovery in life sciences since it highly influences the exact meaning of natural language. The scientific challenge is not only to extract such context data, but also to store this data for further query and discovery approaches. Here, we propose a multiple step knowledge graph approach using labeled property graphs based on polyglot persistence systems to utilize context data for context mining, graph queries, knowledge discovery and extraction. We introduce the graph-theoretic foundation for a general context concept within semantic networks and show a proof-of-concept based on biomedical literature and text mining. Our test system contains a knowledge graph derived from the entirety of PubMed and SCAIView data and is enriched with text mining data and domain specific language data using BEL. Here, context is a more general concept than annotations. This dense graph has more than 71M nodes and 850M relationships. We discuss the impact of this novel approach with 27 real world use cases represented by graph queries.