论文标题
连续的图形分区,以进行最佳的总或瓶颈通信
Contiguous Graph Partitioning For Optimal Total Or Bottleneck Communication
论文作者
论文摘要
图表分区时间表平行计算,例如稀疏矩阵矢量乘数(SPMV)。我们考虑连续的分区,其中稀疏矩阵的$ m $行(或列)将$ n $ nonzeros分为$ k $零件而无需重新排序。我们提出了第一个近线性时间算法,用于连续制度中的几种图形分区问题。 传统的目标,例如简单的边缘切割,超边缘切割或超图连接性,最大程度地减少了平衡限制下所有零件的总成本。我们的总分区使用$ O(km + n)$空间。它们以$ o((km \ log(m) + n)\ log(n))$时间运行,比先前的$ o(k(m^2 + n))$ time Algorithms的$ O(k(m^2 + n))显着改善。 al。 瓶颈分区最大程度可最大程度地减少任何部分的最大成本。我们提出了一种新的瓶颈成本,该成本反映了每个部分的通信和计算总和。我们的瓶颈分区使用线性空间。当$ k^2 $为$ o(n^c)$的$ c <1 $时,确切的算法在线性时间内运行。当$ k \ log(c_ {high}/(c_ {low}ε))$($ o(n^c)$ for $ c <1 $,其中$ c_ {high} $ and $ c_ {high} $和$ c_ {low} $是最佳成本,我们的$(1 +ε)$ - 近似算法以线性时间运行。我们还提出了一个更简单的$(1 +ε)$ - 近似算法,该算法以$ \ log(c_ {high}/(c_ {high}/(c_ {low}ε))$从线性时间运行。 我们从经验上证明,我们的算法有效地在42个测试矩阵的测试套件上产生高质量的连续分区。当$ k = 8 $时,我们的HyperGraph连接分区分区的速度为$ 53 \ times $(平均$ 15.1 \ times $)。我们的瓶颈分区者的平均运行时间为5.15 SPMV。
Graph partitioning schedules parallel calculations like sparse matrix-vector multiply (SpMV). We consider contiguous partitions, where the $m$ rows (or columns) of a sparse matrix with $N$ nonzeros are split into $K$ parts without reordering. We propose the first near-linear time algorithms for several graph partitioning problems in the contiguous regime. Traditional objectives such as the simple edge cut, hyperedge cut, or hypergraph connectivity minimize the total cost of all parts under a balance constraint. Our total partitioners use $O(Km + N)$ space. They run in $O((Km\log(m) + N)\log(N))$ time, a significant improvement over prior $O(K(m^2 + N))$ time algorithms due to Kernighan and Grandjean et. al. Bottleneck partitioning minimizes the maximum cost of any part. We propose a new bottleneck cost which reflects the sum of communication and computation on each part. Our bottleneck partitioners use linear space. The exact algorithm runs in linear time when $K^2$ is $O(N^C)$ for $C < 1$. Our $(1 + ε)$-approximate algorithm runs in linear time when $K\log(c_{high}/(c_{low}ε))$ is $O(N^C)$ for $C < 1$, where $c_{high}$ and $c_{low}$ are upper and lower bounds on the optimal cost. We also propose a simpler $(1 + ε)$-approximate algorithm which runs in a factor of $\log(c_{high}/(c_{low}ε))$ from linear time. We empirically demonstrate that our algorithms efficiently produce high-quality contiguous partitions on a test suite of 42 test matrices. When $K = 8$, our hypergraph connectivity partitioner achieved a speedup of $53\times$ (mean $15.1\times$) over prior algorithms. The mean runtime of our bottleneck partitioner was 5.15 SpMVs.