论文标题

使用神经网络突出显示语法

On-the-Fly Syntax Highlighting using Neural Networks

论文作者

Palma, Marco Edoardo, Salza, Pasquale, Gall, Harald C.

论文摘要

在有针对软件开发人员的在线协作工具的存在,从代码查看器到合并请求和代码片段,经常共享和咨询源代码。通常,在这种情况下强调质量的代码是有利于系统响应能力的。在这些即时设置中,对源代码进行正式的语法分析不仅昂贵,而且在很多次输入中也很棘手,这是该语言的无效推导。实际上,当前流行的荧光笔在很大程度上依赖于一种正则表达式系统,通常远非语言的lexer规范。由于它们的复杂性,需要定期更新正态表达式,因为从用户那里收集了更多的反馈,并且他们的设计不受欢迎,因为发现更复杂的语言形成。本文提供了一种基于深度学习的方法,适用于正确和不正确的语言推导(例如代码查看器和摘要)的语法代码。它重点是减轻开发人员的负担,他们可以重复使用该语言的解析策略来制定所需的突出显示规范。此外,对于三种主流编程语言,在不同级别的语法覆盖范围内,将这种方法与如今的在线语法进行了将工具和正式方法的重点。获得的结果表明,所提出的方法如何在预测中始终如一地实现几乎完美的精度,从而优于基于正则表达式的策略。

With the presence of online collaborative tools for software developers, source code is shared and consulted frequently, from code viewers to merge requests and code snippets. Typically, code highlighting quality in such scenarios is sacrificed in favor of system responsiveness. In these on-the-fly settings, performing a formal grammatical analysis of the source code is not only expensive, but also intractable for the many times the input is an invalid derivation of the language. Indeed, current popular highlighters heavily rely on a system of regular expressions, typically far from the specification of the language's lexer. Due to their complexity, regular expressions need to be periodically updated as more feedback is collected from the users and their design unwelcome the detection of more complex language formations. This paper delivers a deep learning-based approach suitable for on-the-fly grammatical code highlighting of correct and incorrect language derivations, such as code viewers and snippets. It focuses on alleviating the burden on the developers, who can reuse the language's parsing strategy to produce the desired highlighting specification. Moreover, this approach is compared to nowadays online syntax highlighting tools and formal methods in terms of accuracy and execution time, across different levels of grammatical coverage, for three mainstream programming languages. The results obtained show how the proposed approach can consistently achieve near-perfect accuracy in its predictions, thereby outperforming regular expression-based strategies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源