薄弱的监督格局

论文标题

薄弱的监督格局

The Weak Supervision Landscape

论文作者

Poyiadzi, Rafael, Bacaicoa-Barber, Daniel, Cid-Sueiro, Jesus, Perello-Nieto, Miquel, Flach, Peter, Santos-Rodriguez, Raul

论文摘要

在实践中，有许多用于机器学习分类任务的数据集注释数据集的方法。这些令人感兴趣，因为它们可以简化或促进注释的收集，同时不影响最终的机器学习模型。其中许多属于弱标签或注释的伞。但是，并不总是清楚不同的替代方案是如何相关的。在本文中，我们提出了一个框架，用于对弱监督设置进行分类：（1）帮助数据集所有者或注释者在规定注释过程时浏览弱监督的可用选项，（2）描述数据集的现有注释来机器学习实践者，以便我们允许他们了解学习过程的含义。为此，我们确定表征弱监督的关键要素，并设计一系列对大多数现有方法进行分类的维度。我们展示了文献中的共同设置如何符合框架，并在实践中讨论其可能的用途。

Many ways of annotating a dataset for machine learning classification tasks that go beyond the usual class labels exist in practice. These are of interest as they can simplify or facilitate the collection of annotations, while not greatly affecting the resulting machine learning model. Many of these fall under the umbrella term of weak labels or annotations. However, it is not always clear how different alternatives are related. In this paper we propose a framework for categorising weak supervision settings with the aim of: (1) helping the dataset owner or annotator navigate through the available options within weak supervision when prescribing an annotation process, and (2) describing existing annotations for a dataset to machine learning practitioners so that we allow them to understand the implications for the learning process. To this end, we identify the key elements that characterise weak supervision and devise a series of dimensions that categorise most of the existing approaches. We show how common settings in the literature fit within the framework and discuss its possible uses in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题