微处理器设计空间探索的多代理增强学习

论文标题

微处理器设计空间探索的多代理增强学习

Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration

论文作者

Krishnan, Srivatsan, Jaques, Natasha, Omidshafiei, Shayegan, Zhang, Dan, Gur, Izzeddin, Reddi, Vijay Janapa, Faust, Aleksandra

论文摘要

微处理器架构师越来越多地诉诸于特定于域的自定义，以寻求高性能和能源效率。随着系统的复杂性的增长，多个子系统（例如，数据台，不同层次结构中的内存块，互连，编译器优化等）的微调体系结构参数迅速导致设计空间的组合爆炸。这使特定于领域的自定义成为极具挑战性的任务。先前的工作使用增强学习（RL）和其他优化方法探索自动探索大型设计空间。然而，这些方法传统上依赖于单代理RL/ML配方。目前尚不清楚随着我们增加设计空间的复杂性（例如，完整的芯片上堆栈系统设计）时，单一代理配方的可扩展性如何。因此，我们提出了一种替代表述，该公式利用多代理RL（MARL）解决此问题。使用MAL的关键思想是一个观察到，不同子系统的参数或多或少是独立的，因此可以分配给每个代理的分散角色。我们通过为几个工作负载轨迹设计域特异性DRAM内存控制器来检验该假设。我们的评估表明，MARL配方始终优于单一代理RL基准，例如近端策略优化和在不同目标目标（例如低功率和潜伏期）上的软批评。为此，这项工作为硬件架构搜索的MARL解决方案中的新研究开辟了道路。

Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency. As the systems grow in complexity, fine-tuning architectural parameters across multiple sub-systems (e.g., datapath, memory blocks in different hierarchies, interconnects, compiler optimization, etc.) quickly results in a combinatorial explosion of design space. This makes domain-specific customization an extremely challenging task. Prior work explores using reinforcement learning (RL) and other optimization methods to automatically explore the large design space. However, these methods have traditionally relied on single-agent RL/ML formulations. It is unclear how scalable single-agent formulations are as we increase the complexity of the design space (e.g., full stack System-on-Chip design). Therefore, we propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem. The key idea behind using MARL is an observation that parameters across different sub-systems are more or less independent, thus allowing a decentralized role assigned to each agent. We test this hypothesis by designing domain-specific DRAM memory controller for several workload traces. Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines such as Proximal Policy Optimization and Soft Actor-Critic over different target objectives such as low power and latency. To this end, this work opens the pathway for new and promising research in MARL solutions for hardware architecture search.

下载PDF全文

下载文献需遵守相关版权规定

论文标题