论文标题
使用指针生成网络提取概念
Concept Extraction Using Pointer-Generator Networks
论文作者
论文摘要
概念提取对于许多下游应用至关重要。但是,令人惊讶的是,直接,直接的单代币/名义块概念对准或词典查找技术(例如DBPEDIA Spotlight)仍然占上风。我们提出了一个通用开放域OOV的提取模型,该模型基于对双向LSTM的指针网络的遥远监督和复制机制。该模型已经在由250k Wikipedia页面专门为此任务编写的大型注释语料库上进行了培训,并在常规页面上进行了测试,在该页面上,到其他页面的指针被视为地面真相概念。实验的结果表明,我们的模型明显胜过标准技术,并且在DBPedia Posplight上使用时,进一步提高了其性能。此外,实验表明,该模型可以很容易地移植到其他同样实现最新性能的数据集中。
Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk-concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open-domain OOV-oriented extractive model that is based on distant supervision of a pointer-generator network leveraging bidirectional LSTMs and a copy mechanism. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance.