特定领域特定文本的用户指导方面分类

论文标题

特定领域特定文本的用户指导方面分类

User-Guided Aspect Classification for Domain-Specific Texts

论文作者

Li, Peiran, Guo, Fang, Shang, Jingbo

论文摘要

方面分类，识别文本段的各个方面，促进了许多应用，例如情感分析和审查摘要。为了减轻人类在注释大规模文本上的努力，在本文中，我们研究了基于几个用户提供的种子单词以进行预定的方面进行分类的问题。主要挑战在于如何处理嘈杂的MISC方面，该方面是为没有任何预定义方面的文本而设计的。甚至领域专家也很难提名MISC方面的种子单词，这使现有的种子驱动文本分类方法不适用。我们提出了一个新颖的框架Arya，该框架可以通过迭代分类器培训和种子更新实现预定义的方面和MISC方面之间的相互增强。具体而言，它训练分类器以获取预定义的方面，然后利用其诱导对MISC方面的监督。 MISC方面的预测结果后来被用来滤除预定义方面的嘈杂种子单词。两个领域的实验证明了我们提出的框架的出色性能，以及正确对MISC方面进行建模的必要性和重要性。

Aspect classification, identifying aspects of text segments, facilitates numerous applications, such as sentiment analysis and review summarization. To alleviate the human effort on annotating massive texts, in this paper, we study the problem of classifying aspects based on only a few user-provided seed words for pre-defined aspects. The major challenge lies in how to handle the noisy misc aspect, which is designed for texts without any pre-defined aspects. Even domain experts have difficulties to nominate seed words for the misc aspect, making existing seed-driven text classification methods not applicable. We propose a novel framework, ARYA, which enables mutual enhancements between pre-defined aspects and the misc aspect via iterative classifier training and seed updating. Specifically, it trains a classifier for pre-defined aspects and then leverages it to induce the supervision for the misc aspect. The prediction results of the misc aspect are later utilized to filter out noisy seed words for pre-defined aspects. Experiments in two domains demonstrate the superior performance of our proposed framework, as well as the necessity and importance of properly modeling the misc aspect.

下载PDF全文

下载文献需遵守相关版权规定

论文标题