基于变压器网络的强化学习方法用于高带宽内存（HBM）的功率分配网络（PDN）优化

论文标题

基于变压器网络的强化学习方法用于高带宽内存（HBM）的功率分配网络（PDN）优化

Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)

论文作者

Park, Hyunwook, Kim, Minsu, Kim, Seongguk, Kim, Keunwoo, Kim, Haeyeon, Shin, Taein, Son, Keeyoung, Sim, Boogyo, Kim, Subin, Jeong, Seungtaek, Hwang, Chulsoon, Kim, Joungho

论文摘要

在本文中，我们首次提出了一种基于变压器网络的增强增强学习方法（RL），用于对高带宽内存（HBM）优化功率分配网络（PDN）。所提出的方法可以提供最佳的去耦电容器（DETAP）设计，以最大程度地减少在多个端口上看到的PDN自我自我和转移阻抗。实现了基于注意力的变压器网络，以直接参数化decap优化策略。由于注意力机制具有强大的表达方式，可以探索大量的组合空间以进行脱离分配，因此最佳性能得到了显着提高。此外，它可以捕获脱皮分配之间的顺序关系。由于可重复使用的网络在探测端口和decap分配候选者的位置上可重复使用的网络，进行优化的计算时间大大减少了。这是因为变压器网络具有上下文嵌入过程，可以捕获包括探测端口位置在内的元功能。此外，对网络进行了随机生成的数据集训练。因此，如果没有额外的培训，训练有素的网络就可以解决新的脱夹优化问题。由于网络的可伸缩性，培训和数据成本的计算时间严重减少。由于其共同的体重属性，该网络可以在没有其他培训的情况下适应更大规模的问题。为了进行验证，我们将结果与常规遗传算法（GA），随机搜索（RS）和所有先前基于RL的方法进行比较。结果，所提出的方法在以下所有方面都优于：最佳性能，计算时间和数据效率。

In this article, for the first time, we propose a transformer network-based reinforcement learning (RL) method for power distribution network (PDN) optimization of high bandwidth memory (HBM). The proposed method can provide an optimal decoupling capacitor (decap) design to maximize the reduction of PDN self- and transfer impedance seen at multiple ports. An attention-based transformer network is implemented to directly parameterize decap optimization policy. The optimality performance is significantly improved since the attention mechanism has powerful expression to explore massive combinatorial space for decap assignments. Moreover, it can capture sequential relationships between the decap assignments. The computing time for optimization is dramatically reduced due to the reusable network on positions of probing ports and decap assignment candidates. This is because the transformer network has a context embedding process to capture meta-features including probing ports positions. In addition, the network is trained with randomly generated data sets. Therefore, without additional training, the trained network can solve new decap optimization problems. The computing time for training and data cost are critically decreased due to the scalability of the network. Thanks to its shared weight property, the network can adapt to a larger scale of problems without additional training. For verification, we compare the results with conventional genetic algorithm (GA), random search (RS), and all the previous RL-based methods. As a result, the proposed method outperforms in all the following aspects: optimality performance, computing time, and data efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题