论文标题

e-bert:电子商务的短语和产品知识增强语言模型

E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce

论文作者

Zhang, Denghui, Yuan, Zixuan, Liu, Yanchi, Zhuang, Fuzhen, Chen, Haifeng, Xiong, Hui

论文摘要

诸如BERT之类的预训练的语言模型在广泛的自然语言处理任务中取得了巨大的成功。但是,由于缺乏两个级别的领域知识,即短语级别和产品级别,BERT不能很好地支持与电子商务相关的任务。一方面,许多电子商务任务需要对域短语进行准确的了解,而这种精细的短语级知识并未通过BERT的培训目标明确建模。另一方面,产品级别的知识(例如产品协会)可以增强电子商务的语言建模,但它们不是事实知识,因此不可分犯地使用它们可能会引入噪音。为了解决这个问题,我们提出了一个统一的培训框架,即E-Bert。具体来说,为了保留短语级别的知识,我们介绍了自适应混合遮罩,该掩蔽件使该模型可以根据两种模式的合适进步来适应学习初步的单词知识转变为学习复杂的短语。为了利用产品级知识,我们介绍了邻居产品重建,该重建训练E-Bert,以脱氧跨注意层的方式预测产品相关的邻居。我们的调查揭示了四个下游任务的有希望的结果,即基于审核的问题答案,方面提取,情感分类和产品分类。

Pre-trained language models such as BERT have achieved great success in a broad range of natural language processing tasks. However, BERT cannot well support E-commerce related tasks due to the lack of two levels of domain knowledge, i.e., phrase-level and product-level. On one hand, many E-commerce tasks require an accurate understanding of domain phrases, whereas such fine-grained phrase-level knowledge is not explicitly modeled by BERT's training objective. On the other hand, product-level knowledge like product associations can enhance the language modeling of E-commerce, but they are not factual knowledge thus using them indiscriminately may introduce noise. To tackle the problem, we propose a unified pre-training framework, namely, E-BERT. Specifically, to preserve phrase-level knowledge, we introduce Adaptive Hybrid Masking, which allows the model to adaptively switch from learning preliminary word knowledge to learning complex phrases, based on the fitting progress of two modes. To utilize product-level knowledge, we introduce Neighbor Product Reconstruction, which trains E-BERT to predict a product's associated neighbors with a denoising cross attention layer. Our investigation reveals promising results in four downstream tasks, i.e., review-based question answering, aspect extraction, aspect sentiment classification, and product classification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源