论文标题

提高录制算法的促进算法

A Boosting Algorithm for Positive-Unlabeled Learning

论文作者

Zhao, Yawen, Zhang, Mingzhe, Zhang, Chenhao, Chen, Weitong, Ye, Nan, Xu, Miao

论文摘要

当只有正(P)和未标记(U)数据可用时,正面标记(PU)学习涉及二进制分类问题。许多最近的PU方法基于神经网络,但是尽管在许多完全监督的分类问题上提高了算法的出色表现,但几乎没有采取任何措施来开发PU学习算法。在本文中,我们提出了一种新颖的增强算法,即Adapu,以进行PU学习。与Adaboost类似,ADAPU旨在优化经验指数损失,但损失是基于PU数据,而不是基于正阴性(PN)数据。与Adaboost一样,我们一次学习一个弱分类器及其权重来学习弱分类器的加权组合。但是,ADAPU需要一种非常不同的算法来学习弱分类器并确定其权重。这是因为Adapu使用加权阳性(PN)数据集学习了弱分类器及其权重,其中一些负数据权重$ - $ - $ $ $ $来自原始PU数据,数据权重由当前加权分类器组合确定,但有些数据权重为负。我们的实验表明,ADAPU在几个基准PU数据集上胜过神经网络,包括一个挑战性的网络安全数据集。

Positive-unlabeled (PU) learning deals with binary classification problems when only positive (P) and unlabeled (U) data are available. Many recent PU methods are based on neural networks, but little has been done to develop boosting algorithms for PU learning, despite boosting algorithms' strong performance on many fully supervised classification problems. In this paper, we propose a novel boosting algorithm, AdaPU, for PU learning. Similarly to AdaBoost, AdaPU aims to optimize an empirical exponential loss, but the loss is based on the PU data, rather than on positive-negative (PN) data. As in AdaBoost, we learn a weighted combination of weak classifiers by learning one weak classifier and its weight at a time. However, AdaPU requires a very different algorithm for learning the weak classifiers and determining their weights. This is because AdaPU learns a weak classifier and its weight using a weighted positive-negative (PN) dataset with some negative data weights $-$ the dataset is derived from the original PU data, and the data weights are determined by the current weighted classifier combination, but some data weights are negative. Our experiments showed that AdaPU outperforms neural networks on several benchmark PU datasets, including a large-scale challenging cyber security dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源