论文标题
使用机器学习技术中HPC应用程序中数据分配的块尺寸估计
Block size estimation for data partitioning in HPC applications using machine learning techniques
论文作者
论文摘要
HPC基础架构和框架的广泛使用用于运行数据性应用程序,这导致人们对数据分配技术和策略的兴趣日益增长。实际上,应用程序性能可能会受到数据分区方式的严重影响,这又取决于数据块所选大小,即块大小。因此,找到有效的分区,即合适的块大小,是加速并行数据密集型应用程序并提高可扩展性的关键策略。本文介绍了一种方法,即依赖于监督的机器学习技术的块大小估计方法,即极白-ML(通过机器学习的块大小估计)。通过设计量身定制的tribib的实现,该方法是通过在PYCOMPSS框架之上构建的机器学习算法来评估的。我们通过广泛的实验评估评估了提供的实施的有效性,该评估考虑了包括MANENOSTRUM 4超级计算机在内的神质,数据集和基础架构的不同算法。我们获得的结果表明,最佳ML有效地确定拆分给定数据集的合适方法的能力,从而提供了其适用性的证明,以便在高性能环境中有效执行数据并行应用程序。
The extensive use of HPC infrastructures and frameworks for running dataintensive applications has led to a growing interest in data partitioning techniques and strategies. In fact, application performance can be heavily affected by how data are partitioned, which in turn depends on the selected size for data blocks, i.e. the block size. Therefore, finding an effective partitioning, i.e. a suitable block size, is a key strategy to speed-up parallel data-intensive applications and increase scalability. This paper describes a methodology, namely BLEST-ML (BLock size ESTimation through Machine Learning), for block size estimation that relies on supervised machine learning techniques. The proposed methodology was evaluated by designing an implementation tailored to dislib, a distributed computing library highly focused on machine learning algorithms built on top of the PyCOMPSs framework. We assessed the effectiveness of the provided implementation through an extensive experimental evaluation considering different algorithms from dislib, datasets, and infrastructures, including the MareNostrum 4 supercomputer. The results we obtained show the ability of BLEST-ML to efficiently determine a suitable way to split a given dataset, thus providing a proof of its applicability to enable the efficient execution of data-parallel applications in high performance environments.