论文标题

精确的Xtreme-Multi频道混合方法,用于罗马乌尔都语分析

A Precisely Xtreme-Multi Channel Hybrid Approach For Roman Urdu Sentiment Analysis

论文作者

Memood, Faiza, Ghani, Muhammad Usman, Ibrahim, Muhammad Ali, Shehzadi, Rehab, Asim, Muhammad Nabeel

论文摘要

为了加速罗马乌尔都语的各种自然语言处理任务的执行,本文首次提供了3种使用最广泛使用的方法(即Word2Vec,fastText和Glove)准备的3种神经单词嵌入。使用内在和外在评估方法评估产生的神经单词嵌入的完整性。考虑到缺乏公开可用的基准数据集,它提供了有史以来第一个罗马乌尔都语数据集,该数据集由3241个情感组成,该观点注释了积极,负和中性类别。为了在呈现的数据集上提供基准基线性能,我们适应了多样化的机器学习(支持向量机器逻辑回归,天真的贝叶斯),深度学习(卷积神经网络,经常性神经网络)和混合方法。通过分别使用7种和5种不同特征表示方法比较基于机器和深度学习方法的性能,可以评估产生的神经单词嵌入的有效性。最后,它提出了一种精确的极端多渠道混合方法,该方法的表现优于最先进的机器和深度学习方法,其图为9%,而在F1得分方面为4%。罗马乌尔都语情感分析,罗马乌尔都语的透明词嵌入,word2vec,手套,快文本

In order to accelerate the performance of various Natural Language Processing tasks for Roman Urdu, this paper for the very first time provides 3 neural word embeddings prepared using most widely used approaches namely Word2vec, FastText, and Glove. The integrity of generated neural word embeddings is evaluated using intrinsic and extrinsic evaluation approaches. Considering the lack of publicly available benchmark datasets, it provides a first-ever Roman Urdu dataset which consists of 3241 sentiments annotated against positive, negative and neutral classes. To provide benchmark baseline performance over the presented dataset, we adapt diverse machine learning (Support Vector Machine Logistic Regression, Naive Bayes), deep learning (convolutional neural network, recurrent neural network), and hybrid approaches. Effectiveness of generated neural word embeddings is evaluated by comparing the performance of machine and deep learning based methodologies using 7, and 5 distinct feature representation approaches respectively. Finally, it proposes a novel precisely extreme multi-channel hybrid methodology which outperforms state-of-the-art adapted machine and deep learning approaches by the figure of 9%, and 4% in terms of F1-score. Roman Urdu Sentiment Analysis, Pretrain word embeddings for Roman Urdu, Word2Vec, Glove, Fast-Text

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源