论文标题
生命科学的经验荟萃分析(链接?)在网络上开放数据
An Empirical Meta-analysis of the Life Sciences (Linked?) Open Data on the Web
论文作者
论文摘要
尽管生物医学界在过去十年中发布了几个“开放数据”来源,但大多数研究人员仍然承受着严重的后勤和技术挑战,以发现,查询和整合来自多个来源的异质数据和知识。为了应对这些挑战,社区已经尝试了语义网和链接的数据技术,以创建链接的开放数据(LSLOD)云的生命科学。在本文中,我们从80多个公开可用的生物医学链接数据图中提取模式中的模式,并进行LSLOD模式图,并进行经验荟萃分析,以评估LSLOD云的语义异质性的程度。我们观察到,几个LSLOD源是作为独立数据源而存在的,这些数据源与其他源没有链接,使用最少的重复使用或映射的未发表的模式,并且具有从生物医学的角度使用数据集成的元素。我们设想,LSLOD模式图和这项研究的发现将有助于研究人员,他们希望同时在网络上同时从多个生物医学来源中查询和集成数据和知识。
While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 publicly available biomedical linked data graphs into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.