论文标题
堆栈图:规模分辨率
Stack graphs: Name resolution at scale
论文作者
论文摘要
我们提供堆栈图,这是Visser等人的范围图框架的扩展。堆栈图在GitHub上精确的代码导航,使用户可以在存储库中和跨存储库中导航名称绑定引用。像范围图一样,堆栈图编码图形结构中有关程序的名称绑定信息,其中路径代表有效的名称绑定。然后,通过简单的路径找到搜索来实现对其定义的引用。 Github拥有数百万个存储库,其中包含数百种不同的编程语言实施的总代码,并每分钟接收数千个推送。为了支持此量表,我们确保图形构建和探路判断是文件收入:对于每个源文件,我们在程序中没有任何其他文件中的任何其他文件,创建一个孤立的子图。这使我们可以消除我们已经看到的重新分析文件版本的存储和计算成本。由于大多数提交都会在存储库中更改一小部分文件,因此随着时间的推移,这大大摊销了索引大型,经常更改的存储库的操作成本。要处理类型定向的名称查找(需要“暂停”当前查找以解析另一个名称),我们的名称“分辨率算法”保持了当前暂停(但仍在待处理的)查找的一堆。可以使用新的声明图形构造语言对程序源代码的纯句法分析来构建堆栈图。这意味着我们可以为每个存储库提取名称绑定信息,而无需每包配置,而无需调用任意,不信任,特定包装的构建过程。
We present stack graphs, an extension of Visser et al.'s scope graphs framework. Stack graphs power Precise Code Navigation at GitHub, allowing users to navigate name binding references both within and across repositories. Like scope graphs, stack graphs encode the name binding information about a program in a graph structure, in which paths represent valid name bindings. Resolving a reference to its definition is then implemented with a simple path-finding search. GitHub hosts millions of repositories, containing petabytes of total code, implemented in hundreds of different programming languages, and receiving thousands of pushes per minute. To support this scale, we ensure that the graph construction and path-finding judgments are file-incremental: for each source file, we create an isolated subgraph without any knowledge of, or visibility into, any other file in the program. This lets us eliminate the storage and compute costs of reanalyzing file versions that we have already seen. Since most commits change a small fraction of the files in a repository, this greatly amortizes the operational costs of indexing large, frequently changed repositories over time. To handle type-directed name lookups (which require "pausing" the current lookup to resolve another name), our name resolution algorithm maintains a stack of the currently paused (but still pending) lookups. Stack graphs can be constructed via a purely syntactic analysis of the program's source code, using a new declarative graph construction language. This means that we can extract name binding information for every repository without any per-package configuration, and without having to invoke an arbitrary, untrusted, package-specific build process.