论文标题

基于注意的学术资源文本级别多标签分类

Academic Resource Text Level Multi-label Classification based on Attention

论文作者

Wang, Yue, Li, Yawen, Li, Ang

论文摘要

分层多标签学术文本分类(HMTC)是将学术文本分配为层次结构化的标签系统。我们通过整合诸如文本,关键字和层次结构之类的功能,将基于注意力的层次结构多标签分类算法(AHMCA)提出,将学术文档分为最相关的类别。我们利用Word2Vec和Bilstm获得文本,关键字和层次结构的嵌入和潜在向量表示。我们使用分层注意机制来捕获关键字,标签层次结构和文本词向量之间的关联来生成层次特定的文档嵌入向量,以替换HMCN-F中的原始文本嵌入。学术文本数据集的实验结果证明了AHMCA算法的有效性。

Hierarchical multi-label academic text classification (HMTC) is to assign academic texts into a hierarchically structured labeling system. We propose an attention-based hierarchical multi-label classification algorithm of academic texts (AHMCA) by integrating features such as text, keywords, and hierarchical structure, the academic documents are classified into the most relevant categories. We utilize word2vec and BiLSTM to obtain embedding and latent vector representations of text, keywords, and hierarchies. We use hierarchical attention mechanism to capture the associations between keywords, label hierarchies, and text word vectors to generate hierarchical-specific document embedding vectors to replace the original text embeddings in HMCN-F. The experimental results on the academic text dataset demonstrate the effectiveness of the AHMCA algorithm.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源