耦合固有和外在的人类语言基于资源的查询扩展

论文标题

耦合固有和外在的人类语言基于资源的查询扩展

Coupled intrinsic and extrinsic human language resource-based query expansion

论文作者

Selvaretnam, Bhawani, Belkhatir, Mohammed

论文摘要

信息检索性能不佳通常归因于查询文件词汇不匹配问题，该问题被定义为人类用户很难制定与特定搜索目标相关的文档词汇一致的精确自然语言查询。为了减轻此问题，要应用查询扩展过程，以便产卵并将其他术语整合到初始查询中。这需要准确识别主要查询概念，以确保对预期的搜索目标进行适当的强调，并提取相关的扩展概念并将其包含在丰富的查询中。自然语言查询具有固有的语言特性，例如言论的部分和语法关系，可用于确定预期的搜索目标。此外，还需要基于外在语言的资源，例如本体论，以暗示与查询内容具有语义相干的扩展概念。我们在这里提出了一个查询扩展框架，该框架利用了用户查询的语言特征和用于查询组成部分编码，扩展概念提取和概念加权的本体资源。对现实世界数据集的彻底经验评估验证了我们针对Unigram语言模型，相关模型和基于顺序依赖技术的方法。

Poor information retrieval performance has often been attributed to the query-document vocabulary mismatch problem which is defined as the difficulty for human users to formulate precise natural language queries that are in line with the vocabulary of the documents deemed relevant to a specific search goal. To alleviate this problem, query expansion processes are applied in order to spawn and integrate additional terms to an initial query. This requires accurate identification of main query concepts to ensure the intended search goal is duly emphasized and relevant expansion concepts are extracted and included in the enriched query. Natural language queries have intrinsic linguistic properties such as parts-of-speech labels and grammatical relations which can be utilized in determining the intended search goal. Additionally, extrinsic language-based resources such as ontologies are needed to suggest expansion concepts semantically coherent with the query content. We present here a query expansion framework which capitalizes on both linguistic characteristics of user queries and ontology resources for query constituent encoding, expansion concept extraction and concept weighting. A thorough empirical evaluation on real-world datasets validates our approach against unigram language model, relevance model and a sequential dependence based technique.

下载PDF全文

下载文献需遵守相关版权规定

论文标题