用查询指导的稀疏变压器排名长的文档排名

论文标题

用查询指导的稀疏变压器排名长的文档排名

Long Document Ranking with Query-Directed Sparse Transformer

论文作者

Jiang, Jyun-Yu, Xiong, Chenyan, Lee, Chia-Jung, Wang, Wei

论文摘要

变压器自我注意力的计算成本通常需要违反长期文档以适合文档排名任务中的经过预定的模型。在本文中，我们设计了以稀疏为导向的稀疏注意，从而在变压器自我注意力中引起了IR轴结构。我们的模型QD-Transformer在排名中实施了所需的原理属性：局部上下文化，分层表示和面向查询的接近度匹配，而它也享有稀疏性的效率。对一个充分监督和三个少量TREC文档排名的实验表明，QDS变形器比以前的方法具有一致，强大的优势，因为它们要么将长文档改写为BERT，要么在不强调IR原理的情况下使用稀疏的注意力。我们进一步量化了计算复杂性，并证明我们对TVM实施的稀疏关注效率是完全连接的自我注意力的两倍。所有源代码，训练有素的模型和此工作的预测均可在https://github.com/hallogameboy/qds-transformer上获得。

The computing cost of transformer self-attention often necessitates breaking long documents to fit in pretrained models in document ranking tasks. In this paper, we design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention. Our model, QDS-Transformer, enforces the principle properties desired in ranking: local contextualization, hierarchical representation, and query-oriented proximity matching, while it also enjoys efficiency from sparsity. Experiments on one fully supervised and three few-shot TREC document ranking benchmarks demonstrate the consistent and robust advantage of QDS-Transformer over previous approaches, as they either retrofit long documents into BERT or use sparse attention without emphasizing IR principles. We further quantify the computing complexity and demonstrates that our sparse attention with TVM implementation is twice more efficient than the fully-connected self-attention. All source codes, trained model, and predictions of this work are available at https://github.com/hallogameboy/QDS-Transformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题