用于医学图像分割的数据尺度变压器：体系结构，模型效率和基准测试

论文标题

用于医学图像分割的数据尺度变压器：体系结构，模型效率和基准测试

A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark

论文作者

Gao, Yunhe, Zhou, Mu, Liu, Di, Yan, Zhennan, Zhang, Shaoting, Metaxas, Dimitris N.

论文摘要

变形金刚在自然语言处理和计算机视觉方面表现出色。但是，现有的视觉变形金刚努力从有限的医疗数据中学习，并且无法概括各种医学图像任务。为了应对这些挑战，我们提出了MedFormer，这是一种可易于推广的3D医疗图像分割的数据尺度变压器。我们的方法结合了三个关键要素：理想的感应偏置，线性复杂性关注的层次建模以及多尺度的特征融合，该特征在全球范围内整合了空间和语义信息。 Medformer可以在无需预训练的情况下学习微小至大规模的数据。全面的实验表明，Medformer作为一种多功能分割主链的潜力，在七个公共数据集上优于CNN和视觉变压器，涵盖了多种方式（例如CT和MRI）和各种医疗靶标（例如，健康的器官，患病的组织和肿瘤）。我们为我们的模型和评估管道提供了公众的访问，提供了坚实的基准和无偏的比较，以推动广泛的下游临床应用。

Transformers have demonstrated remarkable performance in natural language processing and computer vision. However, existing vision Transformers struggle to learn from limited medical data and are unable to generalize on diverse medical image tasks. To tackle these challenges, we present MedFormer, a data-scalable Transformer designed for generalizable 3D medical image segmentation. Our approach incorporates three key elements: a desirable inductive bias, hierarchical modeling with linear-complexity attention, and multi-scale feature fusion that integrates spatial and semantic information globally. MedFormer can learn across tiny- to large-scale data without pre-training. Comprehensive experiments demonstrate MedFormer's potential as a versatile segmentation backbone, outperforming CNNs and vision Transformers on seven public datasets covering multiple modalities (e.g., CT and MRI) and various medical targets (e.g., healthy organs, diseased tissues, and tumors). We provide public access to our models and evaluation pipeline, offering solid baselines and unbiased comparisons to advance a wide range of downstream clinical applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题