论文标题

Del-Dock:DNA编码库的分子对接建模

DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries

论文作者

Shmilovich, Kirill, Chen, Benson, Karaletsos, Theofanis, Sultan, Mohammad M.

论文摘要

DNA编码的文库(DEL)技术通过有效测试组合生成的分子库,从而在命中识别方面取得了重大进展。 DEL屏幕测量蛋白结合亲和力,尽管测序读取了用独特的DNA-Barcodes标记的分子的测序读取,这些分子可以在一系列选择实验中幸存下来。已经部署了计算模型来学习与测序计数数据相关的潜在结合亲和力;但是,这种相关性通常被其复杂的数据生成过程中引入的各种噪声来源所困扰。为了denoise del计数数据和具有良好结合亲和力的分子的屏幕,计算模型需要在其建模结构中正确的假设,以捕获数据下的正确信号。 DEL模型的最新进展集中在计数数据的概率公式上,但是迄今为止,现有的方法仅限于使用2D分子级表示。我们引入了一种新的范式Del-Dock,该范式将基于配体的描述符与来自停靠蛋白配体配合物的3-D空间信息相结合。 3-D空间信息允许我们的模型了解实际的绑定方式,而不仅仅是使用配体的结构化信息。我们表明,与先前的工作相比,我们的模型能够有效地降低DEL计数数据以预测与实验结合亲和力测量更好相关的分子富集评分。此外,通过学习一系列对接姿势,我们证明,我们的模型仅在DEL数据上接受培训,隐含地学习了进行良好的对接姿势选择,而无需对昂贵的源源蛋白质晶体结构进行外部监督。

DNA-Encoded Library (DEL) technology has enabled significant advances in hit identification by enabling efficient testing of combinatorially-generated molecular libraries. DEL screens measure protein binding affinity though sequencing reads of molecules tagged with unique DNA-barcodes that survive a series of selection experiments. Computational models have been deployed to learn the latent binding affinities that are correlated to the sequenced count data; however, this correlation is often obfuscated by various sources of noise introduced in its complicated data-generation process. In order to denoise DEL count data and screen for molecules with good binding affinity, computational models require the correct assumptions in their modeling structure to capture the correct signals underlying the data. Recent advances in DEL models have focused on probabilistic formulations of count data, but existing approaches have thus far been limited to only utilizing 2-D molecule-level representations. We introduce a new paradigm, DEL-Dock, that combines ligand-based descriptors with 3-D spatial information from docked protein-ligand complexes. 3-D spatial information allows our model to learn over the actual binding modality rather than using only structured-based information of the ligand. We show that our model is capable of effectively denoising DEL count data to predict molecule enrichment scores that are better correlated with experimental binding affinity measurements compared to prior works. Moreover, by learning over a collection of docked poses we demonstrate that our model, trained only on DEL data, implicitly learns to perform good docking pose selection without requiring external supervision from expensive-to-source protein crystal structures.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源