论文标题

通过类加权最近的邻居进行的多类分类

Multiclass Classification via Class-Weighted Nearest Neighbors

论文作者

Khim, Justin, Xu, Ziyu, Singh, Shashank

论文摘要

我们研究了用于多类别分类的K-Nearthent邻居算法的统计特性,重点关注的是,在这些设置中,类的数量可能很大,并且/或类可能高度不平衡。特别是,我们考虑了具有非均匀类加权的k-nearthiend邻居分类器的一种变体,为此,我们在准确性,类别加权风险和均匀的误差方面得出了上层和最小值下限。此外,我们表明,统一误差界限导致跨一组权重的经验混淆矩阵数量及其种群对应物之间的差异。结果,我们可以调整类权重,以优化分类指标,例如F1分数或Matthew的相关系数,通常在实践中使用,尤其是在具有不平衡类别的设置中。我们还提供了一个简单的示例来实例化我们的边界和数值实验。

We study statistical properties of the k-nearest neighbors algorithm for multiclass classification, with a focus on settings where the number of classes may be large and/or classes may be highly imbalanced. In particular, we consider a variant of the k-nearest neighbor classifier with non-uniform class-weightings, for which we derive upper and minimax lower bounds on accuracy, class-weighted risk, and uniform error. Additionally, we show that uniform error bounds lead to bounds on the difference between empirical confusion matrix quantities and their population counterparts across a set of weights. As a result, we may adjust the class weights to optimize classification metrics such as F1 score or Matthew's Correlation Coefficient that are commonly used in practice, particularly in settings with imbalanced classes. We additionally provide a simple example to instantiate our bounds and numerical experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源