论文标题
调查多语言开放域问题回答的信息不一致
Investigating Information Inconsistency in Multilingual Open-Domain Question Answering
论文作者
论文摘要
基于检索的开放域质量检查系统使用检索的文档和答案对检索的文档进行选择,以查找最佳答案候选人。我们假设多语言问题回答(QA)系统在用不同语言编写的文档时很容易出现信息的不一致,因为这些文档倾向于为同一主题提供不同信息的模型。为了了解信息和文化影响的有偏见的影响,我们分析了多种语言开放域问答案模型的行为,重点是检索偏见。我们分析了不同的检索器模型是否在TYDI QA和XOR-TYDI QA(两个多语言数据集)上使用不同语言的不同语言提出了不同的段落。我们推测,跨语言的文档的内容差异可能反映了文化差异和/或社会偏见。
Retrieval based open-domain QA systems use retrieved documents and answer-span selection over retrieved documents to find best-answer candidates. We hypothesize that multilingual Question Answering (QA) systems are prone to information inconsistency when it comes to documents written in different languages, because these documents tend to provide a model with varying information about the same topic. To understand the effects of the biased availability of information and cultural influence, we analyze the behavior of multilingual open-domain question answering models with a focus on retrieval bias. We analyze if different retriever models present different passages given the same question in different languages on TyDi QA and XOR-TyDi QA, two multilingualQA datasets. We speculate that the content differences in documents across languages might reflect cultural divergences and/or social biases.