HESRPT：并行安排以最大程度地减少平均速度

论文标题

HESRPT：并行安排以最大程度地减少平均速度

heSRPT: Parallel Scheduling to Minimize Mean Slowdown

论文作者

Berg, Benjamin, Vesilo, Rein, Harchol-Balter, Mor

论文摘要

现代数据中心提供能够利用并行性的工作量。当作业跨多个服务器平行时，它将更快地完成，但是作业会从分配其他服务器中获得降低的回报。由于将多个服务器分配给单个作业效率低下，因此尚不清楚如何最好地分配许多可行的可行作业之间的固定服务器。本文提供了第一个最佳分配策略，用于最大程度地减少所有作业在时间0时都存在已知大小的可行作业的平均速度。我们的策略为每时每刻的最佳分配提供了一个简单的封闭式公式。最大程度地减少平均速度通常需要比长期工作（如SRPT政策）偏爱短期工作。但是，由于可行的可行作业具有均匀的加速功能，因此系统效率也是一个问题。通过对所有工作进行平等分配，从而最大程度地提高了系统效率，从而竞争优先级的小工作的目标。我们的最佳政策，高效的SRPT（HESRPT）可以平衡这些竞争目标。 HESRPT根据其尺寸顺序完成工作，但通过在每时每刻将某些服务器分配给每个作业来保持整体系统效率。我们的结果推广，还提供有关平均流动时间的最佳分配策略。最后，我们考虑了随着时间的推移工作到达系统的在线情况。尽管优化在线环境中的平均速度放缓更加困难，但我们发现HESRPT为在线设置提供了出色的启发式政策。实际上，我们的模拟表明，HESRPT极大地超过了可行的工作的最先进的分配政策。

Modern data centers serve workloads which are capable of exploiting parallelism. When a job parallelizes across multiple servers it will complete more quickly, but jobs receive diminishing returns from being allocated additional servers. Because allocating multiple servers to a single job is inefficient, it is unclear how best to allocate a fixed number of servers between many parallelizable jobs. This paper provides the first optimal allocation policy for minimizing the mean slowdown of parallelizable jobs of known size when all jobs are present at time 0. Our policy provides a simple closed form formula for the optimal allocations at every moment in time. Minimizing mean slowdown usually requires favoring short jobs over long ones (as in the SRPT policy). However, because parallelizable jobs have sublinear speedup functions, system efficiency is also an issue. System efficiency is maximized by giving equal allocations to all jobs and thus competes with the goal of prioritizing small jobs. Our optimal policy, high-efficiency SRPT (heSRPT), balances these competing goals. heSRPT completes jobs according to their size order, but maintains overall system efficiency by allocating some servers to each job at every moment in time. Our results generalize to also provide the optimal allocation policy with respect to mean flow time. Finally, we consider the online case where jobs arrive to the system over time. While optimizing mean slowdown in the online setting is even more difficult, we find that heSRPT provides an excellent heuristic policy for the online setting. In fact, our simulations show that heSRPT significantly outperforms state-of-the-art allocation policies for parallelizable jobs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题