论文标题
基于同时流动网络的简短文本的罕见主题发现模型
A Rare Topic Discovery Model for Short Texts Based on Co-occurrence word Network
论文作者
论文摘要
我们为在不平衡的短文本数据集中发现稀缺主题提供了一个简单而通用的解决方案,即,单词基于网络网络的模型CWIBTD可以同时解决短文本主题的稀疏性和不平衡性,并使单词偶然的效果使模型的效果更加集中在模型中,从而使Discovil of Discovils的效果更加关注。与以前的方法不同,CWIBTD使用共发生单词网络来对每个单词的主题分布进行建模,从而改善了数据空间的语义密度,并确保其在识别罕见主题方面的敏感性,通过改善节点活动的方式和正常尺寸的稀缺主题和一定程度的大主题。此外,使用与LDA相同的Gibbs采样使CWIBTD易于扩展到Viri-OUS应用程序方案。在不符合的短文本数据集中进行的广泛实验验证证实了CWIBTD在发现稀有主题时的优越性。我们的模型可用于早期,准确地发现社交平台上新兴主题或意外事件。
We provide a simple and general solution for the discovery of scarce topics in unbalanced short-text datasets, namely, a word co-occurrence network-based model CWIBTD, which can simultaneously address the sparsity and unbalance of short-text topics and attenuate the effect of occasional pairwise occurrences of words, allowing the model to focus more on the discovery of scarce topics. Unlike previous approaches, CWIBTD uses co-occurrence word networks to model the topic distribution of each word, which improves the semantic density of the data space and ensures its sensitivity in identify-ing rare topics by improving the way node activity is calculated and normal-izing scarce topics and large topics to some extent. In addition, using the same Gibbs sampling as LDA makes CWIBTD easy to be extended to vari-ous application scenarios. Extensive experimental validation in the unbal-anced short text dataset confirms the superiority of CWIBTD over the base-line approach in discovering rare topics. Our model can be used for early and accurate discovery of emerging topics or unexpected events on social platforms.