校准：概率模型输出的互动分析

论文标题

校准：概率模型输出的互动分析

Calibrate: Interactive Analysis of Probabilistic Model Output

论文作者

Xenopoulos, Peter, Rulff, Joao, Nonato, Luis Gustavo, Barr, Brian, Silva, Claudio

论文摘要

分析分类模型性能对于机器学习从业人员来说是一项至关重要的任务。尽管从业者经常使用从混淆矩阵中得出的基于计数的指标，例如准确性，许多应用程序，例如天气预测，体育博彩或患者风险预测，但依赖分类器的预测概率而不是预测标签。在这些情况下，从业者关注的是产生校准模型，即输出反映真实分布的模型的模型。通常，通过静态可靠性图对模型校准进行视觉分析，但是，由于所需的强聚集，传统的校准可视化可能会遭受各种缺点。此外，基于计数的方法无法充分分析模型校准。我们提出校准，这是一个解决上述问题的交互性可靠性图。校准构造一个可靠性图，该图表可抵抗传统方法中的缺点，并允许进行交互式子组分析和实例级检查。我们通过在现实世界和合成数据上的用例中证明了校准的实用性。我们通过与常规分析模型校准的数据科学家进行思考实验的结果来进一步验证校准。

Analyzing classification model performance is a crucial task for machine learning practitioners. While practitioners often use count-based metrics derived from confusion matrices, like accuracy, many applications, such as weather prediction, sports betting, or patient risk prediction, rely on a classifier's predicted probabilities rather than predicted labels. In these instances, practitioners are concerned with producing a calibrated model, that is, one which outputs probabilities that reflect those of the true distribution. Model calibration is often analyzed visually, through static reliability diagrams, however, the traditional calibration visualization may suffer from a variety of drawbacks due to the strong aggregations it necessitates. Furthermore, count-based approaches are unable to sufficiently analyze model calibration. We present Calibrate, an interactive reliability diagram that addresses the aforementioned issues. Calibrate constructs a reliability diagram that is resistant to drawbacks in traditional approaches, and allows for interactive subgroup analysis and instance-level inspection. We demonstrate the utility of Calibrate through use cases on both real-world and synthetic data. We further validate Calibrate by presenting the results of a think-aloud experiment with data scientists who routinely analyze model calibration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题