论文标题
伪代表标签半监督学习
Pseudo-Representation Labeling Semi-Supervised Learning
论文作者
论文摘要
近年来,半监督学习(SSL)在利用未标记的数据来改善深度学习模型的性能方面取得了巨大的成功,这大大减少了对大量标记数据的需求。已经提出了许多SSL技术,并在ImageNet和Cifar-10等著名数据集上表现出了有希望的性能。但是,某些退出的技术(尤其是基于数据的基于数据)不适合经验上的工业应用。因此,这项工作提出了伪代理标签,这是一个简单而灵活的框架,它利用伪标记技术迭代地标记了少量未标记的数据,并将其用作培训数据。此外,我们的框架与自我监督的表示的学习集成在一起,从而使分类器从标记和未标记数据的表示形式学习中获得好处。可以在特定模型结构上实现此框架,而不会受到限制,而是改善现有模型的一般技术。与现有方法相比,伪代表标签更加直观,可以有效地解决现实世界中的实际问题。从经验上讲,它的表现优于当前最新的半监督学习方法,例如WM-811K Wafer Map和MIT-BIH心律失常数据集,例如工业类型的分类问题。
In recent years, semi-supervised learning (SSL) has shown tremendous success in leveraging unlabeled data to improve the performance of deep learning models, which significantly reduces the demand for large amounts of labeled data. Many SSL techniques have been proposed and have shown promising performance on famous datasets such as ImageNet and CIFAR-10. However, some exiting techniques (especially data augmentation based) are not suitable for industrial applications empirically. Therefore, this work proposes the pseudo-representation labeling, a simple and flexible framework that utilizes pseudo-labeling techniques to iteratively label a small amount of unlabeled data and use them as training data. In addition, our framework is integrated with self-supervised representation learning such that the classifier gains benefits from representation learning of both labeled and unlabeled data. This framework can be implemented without being limited at the specific model structure, but a general technique to improve the existing model. Compared with the existing approaches, the pseudo-representation labeling is more intuitive and can effectively solve practical problems in the real world. Empirically, it outperforms the current state-of-the-art semi-supervised learning methods in industrial types of classification problems such as the WM-811K wafer map and the MIT-BIH Arrhythmia dataset.