论文标题
在多复制放大和删除下比较复制单调的轮廓
Comparing copy-number profiles under multi-copy amplifications and deletions
论文作者
论文摘要
在癌症进展过程中,恶性细胞会积累会导致遗传像差的体细胞突变。特别是,类似于分段重复或缺失类似的进化事件可以改变基因组中一组基因的拷贝数谱(CNP)。我们的目的是计算仅知道CNP的两个细胞之间的进化距离。这要求最少数量的分段放大和删除将一个CNP变成另一个CNP。最近将其正式化为一个模型,即使每个事件都会将复制数$ 1 $或$ -1 $改变,即使这些事件可能会影响染色体的大部分。我们提出了一个一般成本框架,事件可以将基因的副本修改为更大的数量。我们表明,任何允许分段删除任意长度删除的成本方案使计算距离强烈的NP障碍。然后,当Copy-numbers不零时,我们为问题设计了一个因子$ 2 $近似算法,并提供了一个称为\ Textsf {CNP2CNP}的实现。我们通过从\ textsf {cnp2cnp}推断的成对距离中重建模拟的癌症系统来评估我们的方法,并将其与另外两种替代方案进行比较,即\ textsf {medicc}距离和欧几里得距离。实验结果表明,如果给定的CNP无错误,我们的距离平均产生的系统发育比这些替代方案更为准确,但是\ textsf {medicc}距离对于数据中的错误而言,\ textsf {medicc}距离略有强大。在所有情况下,我们的实验表明我们的方法或\ textsf {medicc}方法应在欧几里得距离上更喜欢。
During cancer progression, malignant cells accumulate somatic mutations that can lead to genetic aberrations. In particular, evolutionary events akin to segmental duplications or deletions can alter the copy-number profile (CNP) of a set of genes in a genome. Our aim is to compute the evolutionary distance between two cells for which only CNPs are known. This asks for the minimum number of segmental amplifications and deletions to turn one CNP into another. This was recently formalized into a model where each event is assumed to alter a copy-number by $1$ or $-1$, even though these events can affect large portions of a chromosome. We propose a general cost framework where an event can modify the copy-number of a gene by larger amounts. We show that any cost scheme that allows segmental deletions of arbitrary length makes computing the distance strongly NP-hard. We then devise a factor $2$ approximation algorithm for the problem when copy-numbers are non-zero and provide an implementation called \textsf{cnp2cnp}. We evaluate our approach experimentally by reconstructing simulated cancer phylogenies from the pairwise distances inferred by \textsf{cnp2cnp} and compare it against two other alternatives, namely the \textsf{MEDICC} distance and the Euclidean distance. The experimental results show that our distance yields more accurate phylogenies on average than these alternatives if the given CNPs are error-free, but that the \textsf{MEDICC} distance is slightly more robust against error in the data. In all cases, our experiments show that either our approach or the \textsf{MEDICC} approach should preferred over the Euclidean distance.