论文标题
对机器学习策略空间进行社区驱动的搜索,以查找NMR属性预测模型
A community-powered search of machine learning strategy space to find NMR property prediction models
论文作者
论文摘要
机器学习的兴起(ML)在使用数据进行科学预测的潜在策略中造成了爆炸。对于希望将ML策略应用于特定领域的物理科学家来说,可能很难提前评估在广阔的可能性空间内采用哪种策略。在这里,我们概述了在线社区驱动的努力的结果,该努力群群搜索ML策略的空间并开发算法,以预测分子中原子型核磁共振(NMR)特性。使用开源数据集,我们与Kaggle合作设计和举办了为期3个月的比赛,该竞赛从84个国家 /地区的2,700个团队获得了47,800毫升模型预测。在3周之内,Kaggle社区制作了与我们先前发表的“内部”努力相当精确的模型。构建为顶部预测的线性组合的元集结模型的预测准确性超过任何单个模型的预测精度,比我们以前的最新时间更好7-19倍。结果突出了变压器体系结构预测量子力学(QM)分子特性的潜力。
The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published "in-house" efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.