论文标题

稀疏矩阵的基于级别的阻塞:稀疏矩阵 - 功率矢量乘法

Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication

论文作者

Alappat, Christie L., Hager, Georg, Schenk, Olaf, Wellein, Gerhard

论文摘要

稀疏矩阵与密集矢量(SPMV)的乘法是许多数值方案中的关键组件,并且已知其性能受到主内存访问的严重限制。几种数值方案要求将稀疏基质多项式与密集矢量相连,该矢量通常以SPMV序列实现。这会导致性能低,而忽略了通过重复使用缓存的矩阵数据来增加算术强度的潜力。在这项工作中,我们使用递归代数着色引擎(RACE)来阻止跨多项式计算的稀疏矩阵数据。在表示稀疏矩阵的图中,我们使用广度优先搜索形成级别。然后,在访问矩阵数据并实现有效的多线程并行化时,这些级别的局部关系随后用于改善空间和时间位置。我们的方法与矩阵结构无关,并避免了在硬件效率和并行化开销方面的现有“阻止”策略的缺点。我们使用性能建模量化了实施质量,并证明了高达3 $ \ times $和5 $ \ times $的加速度与最佳基于SPMV的基线相比,在最近的Intel和AMD体系结构的单个多层芯片上。作为潜在应用,我们证明了我们实施对Chebyshev时间传播方案的好处,代表了指数集成符的多项式近似类别。可能受益于我们的开发项目的其他数值方案包括$ s $ step krylov solvers和功率聚类算法。

The multiplication of a sparse matrix with a dense vector (SpMV) is a key component in many numerical schemes and its performance is known to be severely limited by main memory access. Several numerical schemes require the multiplication of a sparse matrix polynomial with a dense vector, which is typically implemented as a sequence of SpMVs. This results in low performance and ignores the potential to increase the arithmetic intensity by reusing the matrix data from cache. In this work we use the recursive algebraic coloring engine (RACE) to enable blocking of sparse matrix data across the polynomial computations. In the graph representing the sparse matrix we form levels using a breadth-first search. Locality relations of these levels are then used to improve spatial and temporal locality when accessing the matrix data and to implement an efficient multithreaded parallelization. Our approach is independent of the matrix structure and avoids shortcomings of existing "blocking" strategies in terms of hardware efficiency and parallelization overhead. We quantify the quality of our implementation using performance modelling and demonstrate speedups of up to 3$\times$ and 5$\times$ compared to an optimal SpMV-based baseline on a single multicore chip of recent Intel and AMD architectures. As a potential application, we demonstrate the benefit of our implementation for a Chebyshev time propagation scheme, representing the class of polynomial approximations to exponential integrators. Further numerical schemes which may benefit from our developments include $s$-step Krylov solvers and power clustering algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源