通过Fisher内核自我设计的有偏见数据集的深度积极学习

论文标题

通过Fisher内核自我设计的有偏见数据集的深度积极学习

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision

论文作者

Gudovskiy, Denis, Hodgkinson, Alec, Yamaguchi, Takuya, Tsukizawa, Sotaro

论文摘要

主动学习（AL）旨在通过选择注释最具代表性的数据点来最大程度地减少数据需求深度神经网络（DNNS）的标签工作。但是，目前使用的方法缺乏处理有偏见的数据。本文的主要动机是考虑对基于池的半监督AL的现实设置，在该AL中，未标记的火车数据集合有偏见。从理论上讲，我们在这种情况下为Al得出了最佳的采集功能。可以将其表达为未标记的火车数据和弱标记验证数据集之间的分布偏移最小化。为了实现这种采集函数，我们提出了一种低复杂的方法，用于使用自我监督的Fisher内核（FK）以及几种新颖的伪标签估计器，以匹配特征密度。我们基于FK的方法优于MNIST，SVHN和Imagenet分类的最先进方法，而仅需要1/10的处理。进行的实验表明，与现有方法相比，与偏见的类失去平衡数据的标签工作至少下降了40％。

Active learning (AL) aims to minimize labeling efforts for data-demanding deep neural networks (DNNs) by selecting the most representative data points for annotation. However, currently used methods are ill-equipped to deal with biased data. The main motivation of this paper is to consider a realistic setting for pool-based semi-supervised AL, where the unlabeled collection of train data is biased. We theoretically derive an optimal acquisition function for AL in this setting. It can be formulated as distribution shift minimization between unlabeled train data and weakly-labeled validation dataset. To implement such acquisition function, we propose a low-complexity method for feature density matching using self-supervised Fisher kernel (FK) as well as several novel pseudo-label estimators. Our FK-based method outperforms state-of-the-art methods on MNIST, SVHN, and ImageNet classification while requiring only 1/10th of processing. The conducted experiments show at least 40% drop in labeling efforts for the biased class-imbalanced data compared to existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题