论文标题
在分布式数据库环境中进行负载平衡的基准测试散列算法
Benchmarking Hashing Algorithms for Load Balancing in a Distributed Database Environment
论文作者
论文摘要
现代高加载应用程序使用多个数据库实例存储数据。这样的体系结构需要数据一致性,并且重要的是要确保数据之间的数据分布在节点之间。负载平衡用于实现这些目标。 哈希是几乎所有负载平衡系统的骨干。自从引入经典一致的哈希(Hashing)以来,为此设计了许多算法。 负载平衡器的目的之一是确保存储群集可扩展性。对于整个系统的性能至关重要,在添加节点或删除过程中,尽可能少的数据记录传输。负载平衡器哈希算法对此过程具有最大的影响。 在本文中,我们通过实验评估了用于负载平衡的几种哈希算法,进行了模拟和真实的系统实验。为了评估算法性能,我们基于Unidata MDM〜开发了一个基准套件 - 一种可扩展的工具包,用于各种主数据管理(MDM)应用程序。为了进行评估,我们采用了三个标准〜-生产分布的均匀性,移动记录的数量和计算速度。根据我们的实验结果,我们创建了一个表,其中每种算法根据上述标准进行评估。
Modern high load applications store data using multiple database instances. Such an architecture requires data consistency, and it is important to ensure even distribution of data among nodes. Load balancing is used to achieve these goals. Hashing is the backbone of virtually all load balancing systems. Since the introduction of classic Consistent Hashing, many algorithms have been devised for this purpose. One of the purposes of the load balancer is to ensure storage cluster scalability. It is crucial for the performance of the whole system to transfer as few data records as possible during node addition or removal. The load balancer hashing algorithm has the greatest impact on this process. In this paper we experimentally evaluate several hashing algorithms used for load balancing, conducting both simulated and real system experiments. To evaluate algorithm performance, we have developed a benchmark suite based on Unidata MDM~ -- a scalable toolkit for various Master Data Management (MDM) applications. For assessment, we have employed three criteria~ -- uniformity of the produced distribution, the number of moved records, and computation speed. Following the results of our experiments, we have created a table, in which each algorithm is given an assessment according to the abovementioned criteria.