论文标题
识别患者特定的疾病根本原因
Identifying Patient-Specific Root Causes of Disease
论文作者
论文摘要
复杂的疾病是由患者之间可能有所不同的多种因素引起的。结果,将所有患者与所有健康对照组进行比较的假设检验可以检测许多无关紧要的效应大小的重要变量。但是,一些高度预测的根本原因可能仍会在每个患者内部产生疾病。在本文中,我们将特定于患者的根本原因定义为遭受外源性“冲击”的变量,这些变量继续扰动原本健康的系统并诱导疾病。换句话说,变量与结构方程模型(SEM)的外源性误差有关,这些误差预示着下游诊断标签。我们使用样品特异性沙普利值量化预测性。这种推导使我们能够开发出一种称为根因果推断的快速算法,用于通过提取线性SEM的误差项,然后计算与每个误差相关的Shapley值来识别患者特定的根本原因。实验强调了准确性的显着提高,因为该方法揭示了可能在个体水平上具有较大效应的根本原因,但在组级别上的临床效应大小在临床水平上微不足道。可以在github.com/ericstrobl/rci上获得R实现。
Complex diseases are caused by a multitude of factors that may differ between patients. As a result, hypothesis tests comparing all patients to all healthy controls can detect many significant variables with inconsequential effect sizes. A few highly predictive root causes may nevertheless generate disease within each patient. In this paper, we define patient-specific root causes as variables subject to exogenous "shocks" which go on to perturb an otherwise healthy system and induce disease. In other words, the variables are associated with the exogenous errors of a structural equation model (SEM), and these errors predict a downstream diagnostic label. We quantify predictivity using sample-specific Shapley values. This derivation allows us to develop a fast algorithm called Root Causal Inference for identifying patient-specific root causes by extracting the error terms of a linear SEM and then computing the Shapley value associated with each error. Experiments highlight considerable improvements in accuracy because the method uncovers root causes that may have large effect sizes at the individual level but clinically insignificant effect sizes at the group level. An R implementation is available at github.com/ericstrobl/RCI.