CTFormer：低剂量CT DeNoising的无卷积Token2Token扩张视力变压器

论文标题

CTFormer：低剂量CT DeNoising的无卷积Token2Token扩张视力变压器

CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising

论文作者

Wang, Dayang, Fan, Fenglei, Wu, Zhan, Liu, Rui, Wang, Fei, Yu, Hengyong

论文摘要

低剂量计算机断层扫描（LDCT）降解是CT研究中的重要问题。与正常剂量CT（NDCT）相比，LDCT图像受到严重的噪声和伪影。最近，在许多研究中，视觉变压器表现出优于卷积神经网络（CNN）优越的特征表现能力。但是，与CNN不同，到目前为止，LDCT DeNoising中视觉变压器的潜力很少探索。为了填补这一空白，我们提出了一个无卷积的Token2token扩张视力变压器，用于低剂量CT DeNoising。 CTFormer使用更强大的令牌重排来包含本地上下文信息，从而避免卷积。它还扩张和移动具有图以捕获长距离相互作用。我们通过静态检查其内部注意图的模式来解释CTFormer，并使用解释性图动态地追踪分层注意流。此外，引入了一个重叠的推理机制，以有效消除基于编码器的DeNoising模型常见的边界伪像。蛋黄酱LDCT数据集的实验结果表明，CTFormer的表现优于最先进的去核方法，其计算开销较低。

Low-dose computed tomography (LDCT) denoising is an important problem in CT research. Compared to the normal dose CT (NDCT), LDCT images are subjected to severe noise and artifacts. Recently in many studies, vision transformers have shown superior feature representation ability over convolutional neural networks (CNNs). However, unlike CNNs, the potential of vision transformers in LDCT denoising was little explored so far. To fill this gap, we propose a Convolution-free Token2Token Dilated Vision Transformer for low-dose CT denoising. The CTformer uses a more powerful token rearrangement to encompass local contextual information and thus avoids convolution. It also dilates and shifts feature maps to capture longer-range interaction. We interpret the CTformer by statically inspecting patterns of its internal attention maps and dynamically tracing the hierarchical attention flow with an explanatory graph. Furthermore, an overlapped inference mechanism is introduced to effectively eliminate the boundary artifacts that are common for encoder-decoder-based denoising models. Experimental results on Mayo LDCT dataset suggest that the CTformer outperforms the state-of-the-art denoising methods with a low computation overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题