使用机器学习和算术编码解码技术的遗传序列压缩

论文标题

使用机器学习和算术编码解码技术的遗传序列压缩

Genetic Sequence compression using Machine Learning and Arithmetic Encoding Decoding Techniques

论文作者

Sarkar, Mehedi Hasan, Ashrafi, Adnan Ferdous

论文摘要

我们生活在生物信息学正在迅速扩展的时期，由于高通量基因组测序技术的发展，已经产生了大量基因组数据，这引起了人们对与数据存储和传输相关的成本的担忧。如何从基因组序列中正确压缩数据的问题仍然开放。以前，许多研究人员提出了有关此主题DNA压缩的许多压缩方法，而无需机器学习和机器学习方法。扩展了先前的研究，我们提出了一种新的体系结构，例如修改后的DeepDNA，我们提出了一种新方法，即采用双基调策略来压缩DNA序列。并通过对三种尺寸的数据集进行实验来验证结果，为100、243、356。实验结果突出了我们改进的方法优于分析人类线粒体基因组数据的现有方法的优越性，例如deepDNA。

We live in a period where bio-informatics is rapidly expanding, a significant quantity of genomic data has been produced as a result of the advancement of high-throughput genome sequencing technology, raising concerns about the costs associated with data storage and transmission. The question of how to properly compress data from genomic sequences is still open. Previously many researcher proposed many compression method on this topic DNA Compression without machine learning and with machine learning approach. Extending a previous research, we propose a new architecture like modified DeepDNA and we have propose a new methodology be deploying a double base-ed strategy for compression of DNA sequences. And validated the results by experimenting on three sizes of datasets are 100, 243, 356. The experimental outcomes highlight our improved approach's superiority over existing approaches for analyzing the human mitochondrial genome data, such as DeepDNA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题