通过预测值得注意的话语来从医师对话中提取结构化数据

论文标题

通过预测值得注意的话语来从医师对话中提取结构化数据

Extracting Structured Data from Physician-Patient Conversations By Predicting Noteworthy Utterances

论文作者

Krishna, Kundan, Pavel, Amy, Schloss, Benjamin, Bigham, Jeffrey P., Lipton, Zachary C.

论文摘要

尽管为挖掘各种医学数据的各种努力，但在护理时，医生和患者之间的对话仍然是未开发的见解来源。在本文中，我们利用这些数据来提取结构化信息，这些信息可能会帮助医生进行电子健康记录中的访问后文档，从而减轻文书负担。在这项探索性研究中，我们描述了一个新的数据集，该数据集由对话成绩单，访问后摘要，相应的支持证据（在笔录中）和结构化标签组成。我们专注于识别器官系统综述（ROS）中相关诊断和异常的任务。一种方法上的挑战是对话很长（大约1500个单词），因此现代深度学习模型很难将其用作输入。为了应对这一挑战，我们提取了值得注意的话语 - 对话的一部分很可能被认为是支持一些摘要句子的证据。我们发现，通过首先过滤（预测的）值得注意的话语，我们可以显着提高识别诊断和ROS异常的预测性能。

Despite diverse efforts to mine various modalities of medical data, the conversations between physicians and patients at the time of care remain an untapped source of insights. In this paper, we leverage this data to extract structured information that might assist physicians with post-visit documentation in electronic health records, potentially lightening the clerical burden. In this exploratory study, we describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels. We focus on the tasks of recognizing relevant diagnoses and abnormalities in the review of organ systems (RoS). One methodological challenge is that the conversations are long (around 1500 words), making it difficult for modern deep-learning models to use them as input. To address this challenge, we extract noteworthy utterances---parts of the conversation likely to be cited as evidence supporting some summary sentence. We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题