论文标题

增强文档表示形式,可通过插值和扰动进行致密检索

Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation

论文作者

Jeong, Soyeong, Baek, Jinheon, Cho, Sukmin, Hwang, Sung Ju, Park, Jong C.

论文摘要

密集的检索模型旨在检索在密集表示空间上的输入查询的最相关文档,因此引起了极大的关注。但是,密集的模型需要大量标记的培训数据才能出色,而获得人类注释的查询文件对通常是具有挑战性的。为了解决这个问题,我们提出了一个简单但有效的文档扩展,以进行密集检索(DAR)框架,该框架通过其插值和扰动来增强文档的表示。我们使用两个基准数据集验证了DAR在检索任务上的性能,这表明所提出的DAR在标记和未标记文档的密集检索上大大优于相关的基准。

Dense retrieval models, which aim at retrieving the most relevant document for an input query on a dense representation space, have gained considerable attention for their remarkable success. Yet, dense models require a vast amount of labeled training data for notable performance, whereas it is often challenging to acquire query-document pairs annotated by humans. To tackle this problem, we propose a simple but effective Document Augmentation for dense Retrieval (DAR) framework, which augments the representations of documents with their interpolation and perturbation. We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源