批处理的多代理增强学习，以进行有效的交通信号优化

论文标题

批处理的多代理增强学习，以进行有效的交通信号优化

Batch-Augmented Multi-Agent Reinforcement Learning for Efficient Traffic Signal Optimization

论文作者

Wu, Yueh-Hua, Yeh, I-Hau, Hu, David, Liao, Hong-Yuan Mark

论文摘要

这项工作的目的是为流量信号控制问题提供基于强化学习的可行解决方案。尽管最先进的强化学习方法在各种领域取得了巨大的成功，但是考虑到高样本效率的要求以及如何收集培训数据，将其直接应用于减轻交通拥堵可能会具有挑战性。在这项工作中，我们解决了我们试图减轻大都市地区严重交通拥堵时遇到的几个挑战。具体来说，我们需要提供一个能够（1）处理交通信号控制的解决方案，当某些监视摄像机检索有关强化学习信息的某些监视摄像机降低了，（2）在没有交通模拟器的情况下从批处理数据中学习，以及（3）在没有共享信息的情况下进行控制决策。我们提出了一个两阶段的框架来处理上述情况。该框架可以分解为一种进化策略方法，该方法提供了固定的交通信号控制时间表和多代理的外部强化学习，该学习能够借助三个提议的组件，有限的动作，批次增强和替代奖励夹具从批处理数据中学习。我们的实验表明，与当前使用的固定时间交通信号计划相比，提议的框架在等待时间方面将交通拥堵减少了36％。此外，该框架仅需要对模拟器的600个查询才能达到结果。

The goal of this work is to provide a viable solution based on reinforcement learning for traffic signal control problems. Although the state-of-the-art reinforcement learning approaches have yielded great success in a variety of domains, directly applying it to alleviate traffic congestion can be challenging, considering the requirement of high sample efficiency and how training data is gathered. In this work, we address several challenges that we encountered when we attempted to mitigate serious traffic congestion occurring in a metropolitan area. Specifically, we are required to provide a solution that is able to (1) handle the traffic signal control when certain surveillance cameras that retrieve information for reinforcement learning are down, (2) learn from batch data without a traffic simulator, and (3) make control decisions without shared information across intersections. We present a two-stage framework to deal with the above-mentioned situations. The framework can be decomposed into an Evolution Strategies approach that gives a fixed-time traffic signal control schedule and a multi-agent off-policy reinforcement learning that is capable of learning from batch data with the aid of three proposed components, bounded action, batch augmentation, and surrogate reward clipping. Our experiments show that the proposed framework reduces traffic congestion by 36% in terms of waiting time compared with the currently used fixed-time traffic signal plan. Furthermore, the framework requires only 600 queries to a simulator to achieve the result.

下载PDF全文

下载文献需遵守相关版权规定

论文标题