医疗保健中的未来AI：虚假警报的海啸还是专家的产物？

论文标题

医疗保健中的未来AI：虚假警报的海啸还是专家的产物？

The Future AI in Healthcare: A Tsunami of False Alarms or a Product of Experts?

论文作者

Clifford, Gari D.

论文摘要

与训练有素的人相比，最近负担得起且可访问的计算能力和数据存储的显着增加使机器学习几乎提供了令人难以置信的分类和预测性能。复杂的医疗保健景观，尤其是在成像中，已经有一些有希望的（但有限）的结果。这一诺言使一些人得出结论，即我们将通过将“人工智能”应用于“大（医学）数据”来解决人类健康和医学中不断增加的问题。科学文献被算法淹没了，超越了我们有效审查它们的能力。不幸的是，我认为大多数（如果不是全部）出版物或商业算法会犯一些基本错误。我认为，因为每个人（因此每个算法）都有盲点，所以有多种“最佳”算法，每种算法在不同类型的患者或不同情况下都表现出色。因此，我们应该一起对许多算法进行投票，并由它们的整体性能，彼此独立以及定义上下文的一组功能（即，当一种算法优于另一个算法时，都可以最大程度地歧视情况的功能）。这种方法不仅提供了更好的性能分类器或预测指标，而且提供了置信区间，以便临床医生可以判断如何对警报做出响应。此外，我认为可以通过大型的国际竞争/挑战持续数月来产生足够数量的（主要是）解决同一问题的独立算法，并定义了成功事件的条件。最后，我建议介绍主要授予者在资金的最后一年进行挑战的要求，以最大程度地提高研究价值并选择新一代的受赠人。

Recent significant increases in affordable and accessible computational power and data storage have enabled machine learning to provide almost unbelievable classification and prediction performances compared to well-trained humans. There have been some promising (but limited) results in the complex healthcare landscape, particularly in imaging. This promise has led some individuals to leap to the conclusion that we will solve an ever-increasing number of problems in human health and medicine by applying `artificial intelligence' to `big (medical) data'. The scientific literature has been inundated with algorithms, outstripping our ability to review them effectively. Unfortunately, I argue that most, if not all of these publications or commercial algorithms make several fundamental errors. I argue that because everyone (and therefore every algorithm) has blind spots, there are multiple `best' algorithms, each of which excels on different types of patients or in different contexts. Consequently, we should vote many algorithms together, weighted by their overall performance, their independence from each other, and a set of features that define the context (i.e., the features that maximally discriminate between the situations when one algorithm outperforms another). This approach not only provides a better performing classifier or predictor but provides confidence intervals so that a clinician can judge how to respond to an alert. Moreover, I argue that a sufficient number of (mostly) independent algorithms that address the same problem can be generated through a large international competition/challenge, lasting many months and define the conditions for a successful event. Finally, I propose introducing the requirement for major grantees to run challenges in the final year of funding to maximize the value of research and select a new generation of grantees.

下载PDF全文

下载文献需遵守相关版权规定

论文标题