自动转换不可减至的表示，以有效地收缩张量与环状对称性

论文标题

自动转换不可减至的表示，以有效地收缩张量与环状对称性

Automatic transformation of irreducible representations for efficient contraction of tensors with cyclic group symmetry

论文作者

Gao, Yang, Helms, Phillip, Chan, Garnet Kin-Lic, Solomonik, Edgar

论文摘要

张量收缩在计算化学和物理学中无处不在，其中张量通常代表状态或操作员，并且收缩表达了这些数量的代数。在这种情况下，各州和运营商经常保留物理保护法，这些定律表现为张量中的群体对称性。这些组对称性暗示每个张量具有块稀疏性，并且可以以降低形式存储。对于非平凡收缩，在对称部门数量中，通过线性和二次因素分别降低了记忆足迹和成本。最先进的张量收缩软件库通过迭代块或使用一般块 - 宽量张量表示来利用这一机会。两种方法都涉及性能和代码复杂性的开销。借助张量图的直觉，我们提出了一种技术，不可约定的表示，该技术可以通过使用特定于收缩的降低形式通过密集的张量来有效处理Abelian组对称性。该技术产生了一种任意组对称收缩的一般算法，我们在Python中实施，并适用于量子化学和张量网络方法的各种代表性收缩。由于仅依靠密度张量收缩，我们可以通过Intel的MKL和通过Cyclops库通过Intel的MKL和分布式张量收缩来轻松地利用有效的批量矩阵乘法，从而实现了良好的效率和可行的可扩展性，可在多达4096骑士降落核心的降落核心。

Tensor contractions are ubiquitous in computational chemistry and physics, where tensors generally represent states or operators and contractions express the algebra of these quantities. In this context, the states and operators often preserve physical conservation laws, which are manifested as group symmetries in the tensors. These group symmetries imply that each tensor has block sparsity and can be stored in a reduced form. For nontrivial contractions, the memory footprint and cost are lowered, respectively, by a linear and a quadratic factor in the number of symmetry sectors. State-of-the-art tensor contraction software libraries exploit this opportunity by iterating over blocks or using general block-sparse tensor representations. Both approaches entail overhead in performance and code complexity. With intuition aided by tensor diagrams, we present a technique, irreducible representation alignment, which enables efficient handling of Abelian group symmetries via only dense tensors, by using contraction-specific reduced forms. This technique yields a general algorithm for arbitrary group symmetric contractions, which we implement in Python and apply to a variety of representative contractions from quantum chemistry and tensor network methods. As a consequence of relying on only dense tensor contractions, we can easily make use of efficient batched matrix multiplication via Intel's MKL and distributed tensor contraction via the Cyclops library, achieving good efficiency and parallel scalability on up to 4096 Knights Landing cores of a supercomputer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题