使用内核稀疏表示分类器的抗癌肽分类

论文标题

使用内核稀疏表示分类器的抗癌肽分类

Anticancer Peptides Classification using Kernel Sparse Representation Classifier

论文作者

Fazal, Ehtisham, Ibrahim, Muhammad Sohail, Park, Seongyong, Naseem, Imran, Wahab, Abdul

论文摘要

癌症是最具挑战性的疾病之一，由于其复杂性，可变性和原因多样性。在过去的几十年中，它一直是主要的研究主题之一，但仍然对此知之甚少。为此，多方面的治疗框架是必不可少的。 \ emph {抗癌肽}（ACP）是最有前途的治疗选择，但是它们的大规模识别和合成需要可靠的预测方法，这仍然是一个问题。在本文中，我们提出了一种直观的分类策略，该策略与传统的\ emph {black {box}方法不同，并基于\ emph {sparse-presentation Clansion}（SRC）的众所周知的统计理论。具体而言，我们通过嵌入k-Spaced氨基酸对的\ emph {组成}（CKSAAP）来创建过度完整的字典矩阵。与传统的SRC框架不同，我们在此策略中使用有效的\ emph {匹配的追踪}求解器，而不是计算昂贵的\ emph {基础追求}求解器。此外，\ emph {kernel主成分分析}（kPCA）用于应对特征空间的非线性和维度的降低，而\ emph {合成少数群体过度放电技术}（smote）用于平衡字典。在两个基准数据集上评估了所提出的方法，以用于众所周知的统计参数，并发现胜过现有方法。结果表明，敏感性最高，精度最平衡，这可能有益于理解结构和化学方面并发展新的ACP。该建议方法的Google-Colab实现可在作者的GitHub页面（\ href {https://github.com/ehtisham-fazal/acp-kernel-src} {https://githpps://github.com/ehtishub.com/ehtisham-fazal/acp-pp-kernelelel-src}））。

Cancer is one of the most challenging diseases because of its complexity, variability, and diversity of causes. It has been one of the major research topics over the past decades, yet it is still poorly understood. To this end, multifaceted therapeutic frameworks are indispensable. \emph{Anticancer peptides} (ACPs) are the most promising treatment option, but their large-scale identification and synthesis require reliable prediction methods, which is still a problem. In this paper, we present an intuitive classification strategy that differs from the traditional \emph{black box} method and is based on the well-known statistical theory of \emph{sparse-representation classification} (SRC). Specifically, we create over-complete dictionary matrices by embedding the \emph{composition of the K-spaced amino acid pairs} (CKSAAP). Unlike the traditional SRC frameworks, we use an efficient \emph{matching pursuit} solver instead of the computationally expensive \emph{basis pursuit} solver in this strategy. Furthermore, the \emph{kernel principal component analysis} (KPCA) is employed to cope with non-linearity and dimension reduction of the feature space whereas the \emph{synthetic minority oversampling technique} (SMOTE) is used to balance the dictionary. The proposed method is evaluated on two benchmark datasets for well-known statistical parameters and is found to outperform the existing methods. The results show the highest sensitivity with the most balanced accuracy, which might be beneficial in understanding structural and chemical aspects and developing new ACPs. The Google-Colab implementation of the proposed method is available at the author's GitHub page (\href{https://github.com/ehtisham-Fazal/ACP-Kernel-SRC}{https://github.com/ehtisham-fazal/ACP-Kernel-SRC}).

下载PDF全文

下载文献需遵守相关版权规定

论文标题