论文标题
分层深度学习合奏,以自动化ICD-O地形的乳腺癌病理报告分类
Hierarchical Deep Learning Ensemble to Automate the Classification of Breast Cancer Pathology Reports by ICD-O Topography
论文作者
论文摘要
像大多数全球癌症登记处一样,南非的国家癌症注册处使用专家人类编码人员使用适当的国际疾病肿瘤学(ICD-O)代码标记病理学报告,涉及42种不同的癌症类型。大量癌症病理学报告注册表每年从公共部门和私营部门机构收到的大量注释是广泛的。此手动过程再加上其他挑战,导致南非年度癌症统计数据的重大4年滞后。我们提出了一种分层深度学习集合方法,其中包含了最先进的卷积神经网络模型,以自动标记2201个脱离识别的自由文本病理学报告,并在8个类别中使用适当的ICD-O乳腺癌地形代码。我们的结果表明,对于F1微型,F1微型CNN模型的主要站点分类的改善高于14%,而F1宏观分数的55%。我们证明,与预测病理学报告的ICD-O地形模型相比,与预测ICD-O地形代码相比,ICD-O地形分类的最先进模型的层次深度学习集合改进了最先进的模型。
Like most global cancer registries, the National Cancer Registry in South Africa employs expert human coders to label pathology reports using appropriate International Classification of Disease for Oncology (ICD-O) codes spanning 42 different cancer types. The annotation is extensive for the large volume of cancer pathology reports the registry receives annually from public and private sector institutions. This manual process, coupled with other challenges results in a significant 4-year lag in reporting of annual cancer statistics in South Africa. We present a hierarchical deep learning ensemble method incorporating state of the art convolutional neural network models for the automatic labelling of 2201 de-identified, free text pathology reports, with appropriate ICD-O breast cancer topography codes across 8 classes. Our results show an improvement in primary site classification over the state of the art CNN model by greater than 14% for F1 micro and 55% for F1 macro scores. We demonstrate that the hierarchical deep learning ensemble improves on state-of-the-art models for ICD-O topography classification in comparison to a flat multiclass model for predicting ICD-O topography codes for pathology reports.