迈向几次联合NLP

论文标题

迈向几次联合NLP

Towards Practical Few-shot Federated NLP

论文作者

Cai, Dongqi, Wu, Yaozong, Yuan, Haitao, Wang, Shangguang, Lin, Felix Xiaozhu, Xu, Mengwei

论文摘要

基于变压器的预训练模型已成为自然语言处理（NLP）的主要解决方案。对下游任务进行此类预训练的模型通常需要大量标记的私人数据。实际上，私人数据通常分布在异构的移动设备上，并且可能被禁止上传。此外，经过精心策划的标记数据通常很少，这是一个额外的挑战。为了应对这些挑战，我们首先引入了一个数据生成器，用于联合几次学习任务，该任务涵盖了在现实的环境中稀缺标记的数据的数量和偏度。随后，我们提出了Aug-FedPrompt，这是一种基于迅速的联合学习系统，该系统利用丰富的未标记数据进行数据增强。我们的实验表明，Aug-FedPrompt可以使用有限的标记数据进行全集微调。但是，这种竞争性能以巨大的系统成本提供。

Transformer-based pre-trained models have emerged as the predominant solution for natural language processing (NLP). Fine-tuning such pre-trained models for downstream tasks often requires a considerable amount of labeled private data. In practice, private data is often distributed across heterogeneous mobile devices and may be prohibited from being uploaded. Moreover, well-curated labeled data is often scarce, presenting an additional challenge. To address these challenges, we first introduce a data generator for federated few-shot learning tasks, which encompasses the quantity and skewness of scarce labeled data in a realistic setting. Subsequently, we propose AUG-FedPrompt, a prompt-based federated learning system that exploits abundant unlabeled data for data augmentation. Our experiments indicate that AUG-FedPrompt can perform on par with full-set fine-tuning with a limited amount of labeled data. However, such competitive performance comes at a significant system cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题