logram：使用ngram字典有效地解析日志

论文标题

logram：使用ngram字典有效地解析日志

Logram: Efficient Log Parsing Using n-Gram Dictionaries

论文作者

Dai, Hetong, Li, Heng, Shang, Weiyi, Chen, Tse-Hsun, Chen, Che-Shao

论文摘要

软件系统通常在日志中记录重要的运行时信息。日志可帮助从业者了解系统运行时行为并诊断现场故障。由于日志的大小通常很大，因此需要自动日志分析来帮助从业人员进行软件操作和维护工作。通常，自动日志分析的第一步是日志解析，即将非结构化的原始日志转换为结构化数据。但是，日志解析是具有挑战性的，因为日志是由源代码中的静态模板（即记录语句）产生的，但是在解析日志时，模板通常无法访问。先前的工作提出了实现高准确性的自动日志解析方法。但是，随着日志量在云计算时代的迅速增长，效率成为对数解析的主要关注点。在这项工作中，我们提出了一种自动的日志解析方法Logram，该方法利用N-Gram字典来实现有效的日志解析。我们在16个公共日志数据集上评估了Logram，并将Logram与五种最先进的日志解析方法进行了比较。我们发现，Logram具有与最佳现有方法相似的解析精度，而在效率方面的表现要优于这些方法（即比第二快的方法快1.8至5.1倍）。此外，我们在Spark上部署了Logram，我们发现Logram以火花节点的数量（例如，具有接近线性的可伸缩性）有效地缩放出来，而无需牺牲解析精度。此外，我们证明了Logram可以支持对日志的有效在线解析，通过离线模式实现类似的解析结果和效率。

Software systems usually record important runtime information in their logs. Logs help practitioners understand system runtime behaviors and diagnose field failures. As logs are usually very large in size, automated log analysis is needed to assist practitioners in their software operation and maintenance efforts. Typically, the first step of automated log analysis is log parsing, i.e., converting unstructured raw logs into structured data. However, log parsing is challenging, because logs are produced by static templates in the source code (i.e., logging statements) yet the templates are usually inaccessible when parsing logs. Prior work proposed automated log parsing approaches that have achieved high accuracy. However, as the volume of logs grows rapidly in the era of cloud computing, efficiency becomes a major concern in log parsing. In this work, we propose an automated log parsing approach, Logram, which leverages n-gram dictionaries to achieve efficient log parsing. We evaluated Logram on 16 public log datasets and compared Logram with five state-of-the-art log parsing approaches. We found that Logram achieves a similar parsing accuracy to the best existing approaches while outperforms these approaches in efficiency (i.e., 1.8 to 5.1 times faster than the second fastest approaches). Furthermore, we deployed Logram on Spark and we found that Logram scales out efficiently with the number of Spark nodes (e.g., with near-linear scalability) without sacrificing parsing accuracy. In addition, we demonstrated that Logram can support effective online parsing of logs, achieving similar parsing results and efficiency with the offline mode.

下载PDF全文

下载文献需遵守相关版权规定

论文标题