论文标题

从未标记的数据中学习的危险:对半监督学习的后门攻击

The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning

论文作者

Shejwalkar, Virat, Lyu, Lingjuan, Houmansadr, Amir

论文摘要

半监督机器学习(SSL)正在降低训练ML模型的成本,因此越来越受欢迎。它通过使用非常少量的(昂贵的,良好的)标记数据和大量(廉价,未受调的)未标记的数据来做到这一点。与常规全面监督的ML技术相比,SSL表现出可比甚至出色的性能。 在本文中,我们表明,它可以从(未受过启发的)未标记的数据中学习的SSL的关键特征将SSL暴露于强中毒攻击中。实际上,我们认为,由于它依赖非启发的未标记数据,中毒在SSL中比常规全面监督的ML更为严重。 具体而言,我们设计了对SSL的后门中毒攻击,该攻击可以由弱对手进行,而不知知目标SSL管道。这与在完全监督的环境中的先前中毒攻击不同,这些攻击实际上具有强大的对手。我们表明,通过仅毒害未标记的训练数据的0.2%,我们的攻击可能会导致超过80%的测试输入(当它们包含对手的后门触发器时)错误分类。我们的攻击在基准数据集和SSL算法的二十个组合中仍然有效,甚至规避了针对后门攻击的最先进防御能力。我们的工作对现有SSL算法的实际实用性引起了重大关注。

Semi-supervised machine learning (SSL) is gaining popularity as it reduces the cost of training ML models. It does so by using very small amounts of (expensive, well-inspected) labeled data and large amounts of (cheap, non-inspected) unlabeled data. SSL has shown comparable or even superior performances compared to conventional fully-supervised ML techniques. In this paper, we show that the key feature of SSL that it can learn from (non-inspected) unlabeled data exposes SSL to strong poisoning attacks. In fact, we argue that, due to its reliance on non-inspected unlabeled data, poisoning is a much more severe problem in SSL than in conventional fully-supervised ML. Specifically, we design a backdoor poisoning attack on SSL that can be conducted by a weak adversary with no knowledge of target SSL pipeline. This is unlike prior poisoning attacks in fully-supervised settings that assume strong adversaries with practically-unrealistic capabilities. We show that by poisoning only 0.2% of the unlabeled training data, our attack can cause misclassification of more than 80% of test inputs (when they contain the adversary's backdoor trigger). Our attacks remain effective across twenty combinations of benchmark datasets and SSL algorithms, and even circumvent the state-of-the-art defenses against backdoor attacks. Our work raises significant concerns about the practical utility of existing SSL algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源