论文标题

与FST和神经网络对南萨米的形态歧义

Morphological Disambiguation of South Sámi with FSTs and Neural Networks

论文作者

Hämäläinen, Mika, Wiechetek, Linda

论文摘要

我们提出了一种对SouthSámi进行形态歧义的方法,SouthSámi是一种濒临灭绝的语言。我们的方法使用基于FST的形态分析仪为句子中的每个单词生成一组模棱两可的形态读数。这些读数是用在相关的北萨米牛库和一些合成生成的南萨米数据的北部sámiud ud teeper训练的BI-RNN模型上歧义的。歧义是在忽略单词形式和引理的形态标签水平上进行的;这使得在不需要双语词典或对齐单词嵌入的情况下,将北萨米培训数据用于SouthSámi。我们的方法仅需要为SouthSámi提供最小的资源,这也使其在任何其他濒危语言的背景下都可使用和适用。

We present a method for conducting morphological disambiguation for South Sámi, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model trained on the related North Sámi UD Treebank and some synthetically generated South Sámi data. The disambiguation is done on the level of morphological tags ignoring word forms and lemmas; this makes it possible to use North Sámi training data for South Sámi without the need for a bilingual dictionary or aligned word embeddings. Our approach requires only minimal resources for South Sámi, which makes it usable and applicable in the contexts of any other endangered language as well.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源