具有强大自我训练的语言和领域的预训练语言模型的特征适应

论文标题

具有强大自我训练的语言和领域的预训练语言模型的特征适应

Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training

论文作者

Ye, Hai, Tan, Qingyu, He, Ruidan, Li, Juntao, Ng, Hwee Tou, Bing, Lidong

论文摘要

最近，将预训练的语言模型（PRLMS）（例如BERT）改编为新领域，最近引起了很多关注。我们没有像以前的大多数工作中对PRLM进行微调PRLM，而是研究了如何在不进行微调的情况下调整PRLM的功能。我们在本文中探讨了无监督的域适应性（UDA）。借助PRLMS的功能，我们适应了从源域训练的模型到未标记的目标域。自我训练广泛用于UDA，可预测训练目标域数据上的伪标签。但是，预测的伪标签不可避免地包括噪声，这将对训练强大的模型产生负面影响。为了提高自我训练的鲁棒性，在本文中，我们介绍了类吸引的特征自我依据（CFD），以从PRLMS中学习歧视性特征，其中PRLM特征被自我延伸到功能适应模块中，同一类中的功能更加紧密地聚类。我们进一步将CFD扩展到了跨语言设置，在该设置中研究了语言差异。在两个单语和多语言亚马逊评论数据集上进行的实验表明，CFD可以始终如一地提高跨域和跨语言设置中自我训练的性能。

Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains has gained much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we investigate how to adapt the features of PrLMs to new domains without fine-tuning. We explore unsupervised domain adaptation (UDA) in this paper. With the features from PrLMs, we adapt the models trained with labeled data from the source domain to the unlabeled target domain. Self-training is widely used for UDA which predicts pseudo labels on the target domain data for training. However, the predicted pseudo labels inevitably include noise, which will negatively affect training a robust model. To improve the robustness of self-training, in this paper we present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs, in which PrLM features are self-distilled into a feature adaptation module and the features from the same class are more tightly clustered. We further extend CFd to a cross-language setting, in which language discrepancy is studied. Experiments on two monolingual and multilingual Amazon review datasets show that CFd can consistently improve the performance of self-training in cross-domain and cross-language settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题