评估基于BERT的预训练语言模型以检测错误信息

论文标题

评估基于BERT的预训练语言模型以检测错误信息

Evaluating BERT-based Pre-training Language Models for Detecting Misinformation

论文作者

Anggrainingsih, Rini, Hassan, Ghulam Mubashar, Datta, Amitava

论文摘要

由于缺乏在线发布的所有信息，控制在线信息的质量是具有挑战性的。考虑到在线媒体上发表的大量帖子以及它们传播的速度，手动检查几乎是不可能的。因此，需要自动谣言检测技术来限制扩散错误信息的不利影响。先前的研究主要集中于查找和提取文本数据的重要特征。但是，提取特征是时必的，而不是一个高效的过程。这项研究建议基于BERT的预训练的语言模型将文本数据编码为向量，并利用神经网络模型对这些向量进行分类以检测错误信息。此外，比较了具有不同训练参数的不同语言模型（LM）的性能。提出的技术在不同的短文本数据集上进行了测试。该提出的技术的结果已与同一数据集上的最新技术进行了比较。结果表明，所提出的技术的性能优于最先进的技术。我们还通过组合数据集测试了提出的技术。结果表明，大型数据培训和测试大小大大提高了技术的性能。

It is challenging to control the quality of online information due to the lack of supervision over all the information posted online. Manual checking is almost impossible given the vast number of posts made on online media and how quickly they spread. Therefore, there is a need for automated rumour detection techniques to limit the adverse effects of spreading misinformation. Previous studies mainly focused on finding and extracting the significant features of text data. However, extracting features is time-consuming and not a highly effective process. This study proposes the BERT- based pre-trained language models to encode text data into vectors and utilise neural network models to classify these vectors to detect misinformation. Furthermore, different language models (LM) ' performance with different trainable parameters was compared. The proposed technique is tested on different short and long text datasets. The result of the proposed technique has been compared with the state-of-the-art techniques on the same datasets. The results show that the proposed technique performs better than the state-of-the-art techniques. We also tested the proposed technique by combining the datasets. The results demonstrated that the large data training and testing size considerably improves the technique's performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题