论文标题

gundapusunil在Semeval-2020任务9:用于代码混合数据的情感分析的语法语义LSTM架构

gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data

论文作者

Gundapu, Sunil, Mamidi, Radhika

论文摘要

在同一话语中混合多种语言的词汇和语法的现象称为代码混合。这在多语言社会中更为明显。在本文中,我们为Semeval 2020:任务9开发了一个关于代码混合社交媒体文本的任务9。我们的系统首先为社交媒体文本生成两种类型的嵌入。在这些中,第一个是字符级嵌入,用于编码字符级别信息并处理播音外条目,第二个是用于捕获形态和语义的FastText Word嵌入。这两个嵌入被传递到LSTM网络,系统的表现优于基线模型。

The phenomenon of mixing the vocabulary and syntax of multiple languages within the same utterance is called Code-Mixing. This is more evident in multilingual societies. In this paper, we have developed a system for SemEval 2020: Task 9 on Sentiment Analysis for Code-Mixed Social Media Text. Our system first generates two types of embeddings for the social media text. In those, the first one is character level embeddings to encode the character level information and to handle the out-of-vocabulary entries and the second one is FastText word embeddings for capturing morphology and semantics. These two embeddings were passed to the LSTM network and the system outperformed the baseline model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源