论文标题
Malestnet:用于预测维护语言资源的协作开源库
MaintNet: A Collaborative Open-Source Library for Predictive Maintenance Language Resources
论文作者
论文摘要
维护记录日志是NLP中的新兴文本类型。它们通常由具有许多特定领域的技术术语,缩写以及非标准拼写和语法的免费文本文档组成,这在对标准语料库进行培训的NLP管道上构成了困难。在预测维护系统的开发中,分析和注释此类文档尤其重要,旨在提供运营效率,防止事故并挽救生命。为了促进和鼓励在这一领域进行研究,我们开发了Nabornet,这是一个协作的技术和特定领域语言数据集的开源库。 Malestnet提供了来自航空,汽车和设施域的新型日志数据,以及帮助其(预 - )处理和聚类的工具。此外,它提供了一种鼓励讨论并共享新数据集和工具进行日志数据分析的方法。
Maintenance record logbooks are an emerging text type in NLP. They typically consist of free text documents with many domain specific technical terms, abbreviations, as well as non-standard spelling and grammar, which poses difficulties to NLP pipelines trained on standard corpora. Analyzing and annotating such documents is of particular importance in the development of predictive maintenance systems, which aim to provide operational efficiencies, prevent accidents and save lives. In order to facilitate and encourage research in this area, we have developed MaintNet, a collaborative open-source library of technical and domain-specific language datasets. MaintNet provides novel logbook data from the aviation, automotive, and facilities domains along with tools to aid in their (pre-)processing and clustering. Furthermore, it provides a way to encourage discussion on and sharing of new datasets and tools for logbook data analysis.