论文标题
根对准微笑:化学反应预测的紧密表示
Root-aligned SMILES: A Tight Representation for Chemical Reaction Prediction
论文作者
论文摘要
化学反应预测,涉及正向合成和逆合合成预测,是有机合成中的一个基本问题。流行的计算范式将综合预测作为序列到序列翻译问题,其中采用典型的笑容以用于分子表示。然而,通用微笑忽略了化学反应的特征,其中分子图拓扑在很大程度上从反应物到产物不变,如果直接施加了笑容,则会导致微笑的次优性能。在本文中,我们提出了与根相位的微笑(R-Smiles),该微笑(R-Smiles)指定了产品和反应物微笑之间的紧密对准一对一的映射,以进行更有效的合成预测。由于严格的一对一映射和降低的编辑距离,计算模型在很大程度上避免学习复杂的语法,并致力于学习反应的化学知识。我们将所提出的R-Smiles与各种最新基准进行比较,并表明它明显优于所有基准,这表明了所提出的方法的优越性。
Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.