论文标题

神经方法有效,高效和暴露感知信息检索

Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

论文作者

Mitra, Bhaskar

论文摘要

具有深度体系结构的神经网络已显示出计算机视觉,语音识别和自然语言处理的大幅提高。但是,信息检索(IR)的挑战与这些其他应用领域不同。 IR的一种常见形式涉及对基于关键字的查询的响应文档的排名或简短段落。有效的IR系统必须通过对不同查询和文档术语之间的关系进行建模以及它们如何指示相关性来处理查询文档词汇不匹配问题。当查询包含稀有术语时,模型还应考虑词汇匹配 - 例如一个人的名称或产品型号,而不是在训练过程中可以看到,并且避免检索语义相关但无关紧要的结果。在许多现实生活中的IR任务中,检索涉及非常大的收藏,例如商业网络搜索引擎的文档索引(包括数十亿个文档)。有效的IR方法应利用专门的IR数据结构(例如倒置索引)从大型收集中有效检索。鉴于信息的需求,IR系统还通过确定是否应显示该信息以及应放置位置的位置,以及其他结果,介导了伪像获得多少接触。除了相关性之外,曝光感知的IR系统还可以优化其他目标,例如检索项目和内容发布者的暴露率。在本文中,我们提出了由IR任务的特定需求和挑战所激发的新型神经体系结构和方法。

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents--or short passages--in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms--such as a person's name or a product model number--not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections--such as the document index of a commercial Web search engine--containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源