论文标题

建立紧凑的健康决定因素

Creating Compact Regions of Social Determinants of Health

论文作者

Lattimer, Barrett, Lattimer, Alan

论文摘要

区域化是将数据集分解为彼此异质区域的连续区域的行为。存在许多不同的算法用于进行区域化;但是,近年来,在大型现实世界数据集上使用这些算法仅在计算功率方面变得可行。比较了不同的区域化方法,并且确实缺乏分析记忆,可扩展性,地理指标和大型现实世界应用的研究。这项研究使用现实世界中的健康决定因素(SDOH)数据比较了最新的区域化方法,即集聚聚类,溜冰者,REDCAP,AZP和MAX-P区域。现实世界中SDOH数据的规模,最多100万个数据点,不仅比较了不同数据集的算法,而且为每种单独的区域化算法提供了应力测试,其中大多数以前从未在此类尺度上运行。我们使用几个新的地理指标来比较算法并进行比较记忆分析。然后将主要的区域化方法与无约束的K-均值聚类进行比较,它们在弗吉尼亚州和华盛顿特区分离实际健康数据的能力。

Regionalization is the act of breaking a dataset into contiguous homogeneous regions that are heterogeneous from each other. Many different algorithms exist for performing regionalization; however, using these algorithms on large real world data sets have only become feasible in terms of compute power in recent years. Very few studies have been done comparing different regionalization methods, and those that do lack analysis in memory, scalability, geographic metrics, and large-scale real-world applications. This study compares state-of-the-art regionalization methods, namely, Agglomerative Clustering, SKATER, REDCAP, AZP, and Max-P-Regions using real world social determinant of health (SDOH) data. The scale of real world SDOH data, up to 1 million data points in this study, not only compares the algorithms over different data sets but provides a stress test for each individual regionalization algorithm, most of which have never been run on such scales previously. We use several new geographic metrics to compare algorithms as well as perform a comparative memory analysis. The prevailing regionalization method is then compared with unconstrained K-Means clustering on their ability to separate real health data in Virginia and Washington DC.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源