论文标题

增强的最近的邻居分类用于众包

Enhanced Nearest Neighbor Classification for Crowdsourcing

论文作者

Duan, Jiexin, Qiao, Xingye, Cheng, Guang

论文摘要

在机器学习中,众包是标记大量数据的经济方式。但是,产生的标签中的噪声可能会恶化应用于标记数据的任何分类方法的准确性。我们建议一个增强的最近的邻居分类器(ENN)来克服此问题。开发了两种算法来估计工人质量(实际上在实践中是未知的):一种是通过将$ k $ nn分类器应用于专家数据来基于DeNoied的工人标签来构建估计值;另一种是一种迭代算法,即使无访问专家数据也可以正常运行。除了有力的数值证据外,我们提出的方法被证明是基于高质量专家数据的甲骨文版本相同的遗憾。作为技术副产品,得出了分配给每个工人的样本量的下限,以达到最佳的遗憾率。

In machine learning, crowdsourcing is an economical way to label a large amount of data. However, the noise in the produced labels may deteriorate the accuracy of any classification method applied to the labelled data. We propose an enhanced nearest neighbor classifier (ENN) to overcome this issue. Two algorithms are developed to estimate the worker quality (which is often unknown in practice): one is to construct the estimate based on the denoised worker labels by applying the $k$NN classifier to the expert data; the other is an iterative algorithm that works even without access to the expert data. Other than strong numerical evidence, our proposed methods are proven to achieve the same regret as its oracle version based on high-quality expert data. As a technical by-product, a lower bound on the sample size assigned to each worker to reach the optimal convergence rate of regret is derived.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源