论文标题
跨语性短语检索
Cross-Lingual Phrase Retrieval
论文作者
论文摘要
跨语言检索旨在跨语言检索相关文本。当前的方法通常通过在单词或句子级别学习语言 - 敏捷的文本表示来实现跨语性检索。但是,如何学习跨语性短语检索的短语表示仍然是一个开放的问题。在本文中,我们提出了XPR,这是一种跨语性短语检索器,从未标记的示例句子中提取短语表示。此外,我们创建了一个大规模的跨语性短语检索数据集,其中包含65k双语短语对和4.2m示例句子,以8个以英语为中心的语言对。实验结果表明,XPR优于使用单词级别或句子级表示的最先进的基线。 XPR还显示出令人印象深刻的零射击可传递性,使该模型能够在训练过程中以看不见的语言进行检索。我们的数据集,代码和受过训练的模型可在www.github.com/cwszz/xpr/上公开获取。
Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at www.github.com/cwszz/XPR/.