与基因组关联研究的阳性未标记学习的表型研究

论文标题

与基因组关联研究的阳性未标记学习的表型研究

Phenotyping with Positive Unlabelled Learning for Genome-Wide Association Studies

论文作者

Vauvelle, Andre, Tomlinson, Hamish, Sim, Aaron, Denaxas, Spiros

论文摘要

识别表型在通过医疗保健和生命科学中的实际应用来进一步了解我们对疾病生物学的理解方面起着重要作用。应对电子健康记录（EHR）中复杂性和噪音的挑战激发了机器学习在表型发现中的应用。尽管最近的研究集中在为临床决策支持寻找预测性亚型，但在这里我们专注于导致表型错误分类的噪声，该噪声可以降低表型检测全基因组关联研究（GWAS）中关联的能力。我们表明，通过将锚学习和变压器体系结构相结合到我们提出的模型Anchorbert中，我们才能检测到以前在大型财团研究中发现的基因组关联，并以5 $ \ times $ $ $ case。当将可用的控件数量减少50 \％时，我们发现与标准表型定义相比，我们的模型能够从GWAS目录中维持40 \％的重要基因组关联。 \关键字{表型\和机器学习\和半监督\和遗传关联研究\和生物学发现}

Identifying phenotypes plays an important role in furthering our understanding of disease biology through practical applications within healthcare and the life sciences. The challenge of dealing with the complexities and noise within electronic health records (EHRs) has motivated applications of machine learning in phenotypic discovery. While recent research has focused on finding predictive subtypes for clinical decision support, here we instead focus on the noise that results in phenotypic misclassification, which can reduce a phenotypes ability to detect associations in genome-wide association studies (GWAS). We show that by combining anchor learning and transformer architectures into our proposed model, AnchorBERT, we are able to detect genomic associations only previously found in large consortium studies with 5$\times$ more cases. When reducing the number of controls available by 50\%, we find our model is able to maintain 40\% more significant genomic associations from the GWAS catalog compared to standard phenotype definitions. \keywords{Phenotyping \and Machine Learning \and Semi-Supervised \and Genetic Association Studies \and Biological Discovery}

下载PDF全文

下载文献需遵守相关版权规定

论文标题