具有模糊任务边界的受污染数据流的在线持续学习

论文标题

具有模糊任务边界的受污染数据流的在线持续学习

Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries

论文作者

Bang, Jihwan, Koh, Hyunseo, Park, Seulki, Song, Hwanjun, Ha, Jung-Woo, Choi, Jonghyun

论文摘要

通过不正确的标签在不断变化的数据分布下学习是一个理想的现实世界问题，但具有挑战性。但是，大量的持续学习方法（CL）方法假设带有干净标签的数据流以及嘈杂的数据流下的在线学习场景尚未被逐渐解散。我们考虑使用具有损坏的标签的模糊数据流中的在线学习进行更实际的CL任务设置，现有的CL方法在困难。为了解决任务，我们首先说明了替代模型的情节记忆中多样性和纯度的重要性。为了平衡情节记忆中的多样性和纯度，我们提出了一种新颖的策略，通过统一的噪音方法来管理和使用记忆，通过标签噪声吸引多样的采样和稳健的学习，并通过半监督的学习。我们对四个现实世界或合成噪声数据集（CIFAR10和100，Mini-Webvision和Food-101n）的经验验证表明，我们的方法在这种现实且富有挑战性的持续学习方案中显着优于先前的艺术。代码和数据拆分可在https://github.com/clovaai/puridiver中找到。

Learning under a continuously changing data distribution with incorrect labels is a desirable real-world problem yet challenging. A large body of continual learning (CL) methods, however, assumes data streams with clean labels, and online learning scenarios under noisy data streams are yet underexplored. We consider a more practical CL task setup of an online learning from blurry data stream with corrupted labels, where existing CL methods struggle. To address the task, we first argue the importance of both diversity and purity of examples in the episodic memory of continual learning models. To balance diversity and purity in the episodic memory, we propose a novel strategy to manage and use the memory by a unified approach of label noise aware diverse sampling and robust learning with semi-supervised learning. Our empirical validations on four real-world or synthetic noise datasets (CIFAR10 and 100, mini-WebVision, and Food-101N) exhibit that our method significantly outperforms prior arts in this realistic and challenging continual learning scenario. Code and data splits are available in https://github.com/clovaai/puridiver.

下载PDF全文

下载文献需遵守相关版权规定

论文标题