论文标题
SemanticCap:通过从语言模型中学习的功能增强的染色质可访问性预测
SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
论文作者
论文摘要
许多无机和有机化合物能够结合DNA并形成复合物,其中与药物相关的分子很重要。染色质的可及性不仅会直接影响药物-DNA相互作用,而且还通过影响TFS和转录调节剂的DNA结合能力而促进或抑制与耐药性相关的关键基因的表达。但是,测量它的生物学实验技术是昂贵且耗时的。近年来,已经提出了几种计算方法来识别基因组的可访问区域。现有的计算模型主要忽略基因序列中碱基的上下文信息。为了解决这些问题,我们提出了一个名为SemanticCap的新解决方案。它引入了一个基因语言模型,该模型模拟了基因序列的上下文,从而能够在基因序列中提供特定位点的有效表示。基本上,我们将基因语言模型提供的功能合并到我们的染色质可访问性模型中。在此过程中,我们设计了一些使特征融合更光滑的方法。与公共基准下的其他系统相比,我们的模型被证明具有更好的性能。
A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affects drug-DNA interactions, but also promote or inhibit the expression of critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, Biological experimental techniques for measuring it are expensive and time consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information of bases in gene sequences. To address these issues, we proposed a new solution named SemanticCAP. It introduces a gene language model which models the context of gene sequences, thus being able to provide an effective representation of a certain site in gene sequences. Basically, we merge the features provided by the gene language model into our chromatin accessibility model. During the process, we designed some methods to make feature fusion smoother. Compared with other systems under public benchmarks, our model proved to have better performance.