论文标题

部分可观测时空混沌系统的无模型预测

MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts

论文作者

Xi, Xiangyu, Lv, Jianwei, Liu, Shuaipeng, Ye, Wei, Yang, Fan, Wan, Guanglu

论文摘要

事件检测(ED)识别并分类了来自非结构化文本的事件触发器,这是信息提取的基本任务。尽管过去几年取得了显着的进展,但大多数研究工作都集中在检测正式文本(例如新闻文章,Wikipedia文件,财务公告)中的事件。此外,每个数据集中的文本都来自单个源或多个但相对均匀的源。随着大量在网络和企业内部积累的用户生成的文本,通常来自多种异构源的这些非正式文本中的有意义的事件已成为具有显着实践价值的问题。作为一项开创性的探索,将事件检测扩展到涉及非正式和异质文本的场景,我们根据用户评论,文本对话和电话对话,在领先的电子商务平台上,提出了一个新的大型中国事件检测数据集。我们通过定量和定性检查数据样本,仔细研究了所提出的数据集的文本非正式性和多源异质性特征。对最先进的事件检测方法进行的广泛实验验证了这些特征所带来的独特挑战,表明多源非正式事件检测仍然是一个开放的问题,需要进一步的努力。我们的基准和代码以\ url {https://github.com/myeclipse/musied}发布。

Event detection (ED) identifies and classifies event triggers from unstructured texts, serving as a fundamental task for information extraction. Despite the remarkable progress achieved in the past several years, most research efforts focus on detecting events from formal texts (e.g., news articles, Wikipedia documents, financial announcements). Moreover, the texts in each dataset are either from a single source or multiple yet relatively homogeneous sources. With massive amounts of user-generated text accumulating on the Web and inside enterprises, identifying meaningful events in these informal texts, usually from multiple heterogeneous sources, has become a problem of significant practical value. As a pioneering exploration that expands event detection to the scenarios involving informal and heterogeneous texts, we propose a new large-scale Chinese event detection dataset based on user reviews, text conversations, and phone conversations in a leading e-commerce platform for food service. We carefully investigate the proposed dataset's textual informality and multi-source heterogeneity characteristics by inspecting data samples quantitatively and qualitatively. Extensive experiments with state-of-the-art event detection methods verify the unique challenges posed by these characteristics, indicating that multi-source informal event detection remains an open problem and requires further efforts. Our benchmark and code are released at \url{https://github.com/myeclipse/MUSIED}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源