论文标题
通过蒙特卡洛重量导数回归优化HEP参数拟合
Optimising HEP parameter fits via Monte Carlo weight derivative regression
论文作者
论文摘要
传统上,HEP事件选择被认为是二进制分类问题,涉及信号和背景的二分法类别。然而,在粒子质量或耦合的分布拟合中,信号事件并非全部等效,因为信号差异截面对相位空间不同区域的测量参数具有不同的敏感性。在本文中,我描述了用于评估和优化HEP参数拟合的数学框架,其中这种灵敏度是按事件定义的,对于MC事件,它是根据其MC权重导数对测量参数进行建模的。最小化测量的统计误差意味着需要解决具有不同灵敏度的(即单独的)事件,这最终代表了非二十分分类问题。由于MC重量衍生物无法用于实际数据,因此我建议的实用策略包括对MC事件的重量衍生物的回归量进行训练,然后将其用作最佳分区变量,以适用于数据事件的一维拟合。这份CHEP2019论文是Chep2018上提出的研究的扩展:尤其是,事件划分的敏感性允许通过分析获得的Fisher信息与可以通过理想检测器获得的最大信息获得的Fisher信息之间的“ FIP”比率的精确计算。使用此表达式,我讨论了FIP与气象学(Brier评分和MSE)中常用的两个指标之间的关系,以及在HEP和该领域中“清晰度”的重要性。我最终指出,HEP分布拟合应使用概率指标(例如FIP或MSE)进行优化和评估,而排名指标(例如AUC)或阈值指标(例如准确性)对这些特定问题的相关性有限。
HEP event selection is traditionally considered a binary classification problem, involving the dichotomous categories of signal and background. In distribution fits for particle masses or couplings, however, signal events are not all equivalent, as the signal differential cross section has different sensitivities to the measured parameter in different regions of phase space. In this paper, I describe a mathematical framework for the evaluation and optimization of HEP parameter fits, where this sensitivity is defined on an event-by-event basis, and for MC events it is modeled in terms of their MC weight derivatives with respect to the measured parameter. Minimising the statistical error on a measurement implies the need to resolve (i.e. separate) events with different sensitivities, which ultimately represents a non-dichotomous classification problem. Since MC weight derivatives are not available for real data, the practical strategy I suggest consists in training a regressor of weight derivatives against MC events, and then using it as an optimal partitioning variable for 1-dimensional fits of data events. This CHEP2019 paper is an extension of the study presented at CHEP2018: in particular, event-by-event sensitivities allow the exact computation of the "FIP" ratio between the Fisher information obtained from an analysis and the maximum information that could possibly be obtained with an ideal detector. Using this expression, I discuss the relationship between FIP and two metrics commonly used in Meteorology (Brier score and MSE), and the importance of "sharpness" both in HEP and in that domain. I finally point out that HEP distribution fits should be optimized and evaluated using probabilistic metrics (like FIP or MSE), whereas ranking metrics (like AUC) or threshold metrics (like accuracy) are of limited relevance for these specific problems.