求解具有分层偏外低级别近似值的GPU上的线性系统

论文标题

求解具有分层偏外低级别近似值的GPU上的线性系统

Solving Linear Systems on a GPU with Hierarchically Off-Diagonal Low-Rank Approximations

论文作者

Chen, Chao, Martinsson, Per-Gunnar

论文摘要

我们有兴趣求解由三个应用产生的线性系统：（1）机器学习中的内核方法，（2）从数学物理学中离散边界积分方程，以及（3）在许多大稀疏矩阵中形成的Schur补充。由于其非对角线块具有低数值等级，因此系数矩阵通常是数据范围的。具体而言，我们专注于“层次上的偏高低级别（HODLR）”矩阵。我们介绍用于分解HODLR矩阵并应用GPU上的因素化的算法。该算法利用了批处理密集的线性代数的效率，当固定数值等级时，它们几乎与矩阵大小线性缩放。 HODLR-MATRIX近似的精度是可调参数，因此我们可以构建高敏锐的快速直接求解器或低临界性鲁棒预处理器。数值结果表明，我们可以在单个GPU上几秒钟内解决数百万个未知数的问题。

We are interested in solving linear systems arising from three applications: (1) kernel methods in machine learning, (2) discretization of boundary integral equations from mathematical physics, and (3) Schur complements formed in the factorization of many large sparse matrices. The coefficient matrices are often data-sparse in the sense that their off-diagonal blocks have low numerical ranks; specifically, we focus on "hierarchically off-diagonal low-rank (HODLR)" matrices. We introduce algorithms for factorizing HODLR matrices and for applying the factorizations on a GPU. The algorithms leverage the efficiency of batched dense linear algebra, and they scale nearly linearly with the matrix size when the numerical ranks are fixed. The accuracy of the HODLR-matrix approximation is a tunable parameter, so we can construct high-accuracy fast direct solvers or low-accuracy robust preconditioners. Numerical results show that we can solve problems with several millions of unknowns in a couple of seconds on a single GPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题