具有竞标能力的智能容器：一种半合作学习的政策梯度算法

论文标题

具有竞标能力的智能容器：一种半合作学习的政策梯度算法

Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for Semi-Cooperative Learning

论文作者

van Heeswijk, Wouter

论文摘要

智能模块化货运容器（如物理互联网范式中传播）配备了传感器，数据存储能力和智能，使他们能够在无需手动干预或中央治理的情况下将自己从原点划分到目的地。在这种自组织的环境中，容器可以自主在现货市场环境中自主向运输服务的投标。但是，对于单个容器而言，由于观察到的观察有限，可能很难学习良好的出价政策。通过共享彼此之间的信息和成本，即使同时争夺相同的运输能力，智能容器也可以共同学习招标政策。我们通过在半合作的多代理设置中学习随机投标策略来复制这种行为。为此，我们根据政策梯度框架开发了一种强化学习算法。数值实验表明，共享仅出价和接受决策会导致稳定的出价政策。其他系统信息仅略微提高性能；单个工作属性足以放置适当的投标。此外，我们发现运营商可能会激励不与智能容器共享信息。该实验为后续研究提供了几个方向，尤其是智能容器与自组织物流中运输服务之间的相互作用。

Smart modular freight containers -- as propagated in the Physical Internet paradigm -- are equipped with sensors, data storage capability and intelligence that enable them to route themselves from origin to destination without manual intervention or central governance. In this self-organizing setting, containers can autonomously place bids on transport services in a spot market setting. However, for individual containers it may be difficult to learn good bidding policies due to limited observations. By sharing information and costs between one another, smart containers can jointly learn bidding policies, even though simultaneously competing for the same transport capacity. We replicate this behavior by learning stochastic bidding policies in a semi-cooperative multi agent setting. To this end, we develop a reinforcement learning algorithm based on the policy gradient framework. Numerical experiments show that sharing solely bids and acceptance decisions leads to stable bidding policies. Additional system information only marginally improves performance; individual job properties suffice to place appropriate bids. Furthermore, we find that carriers may have incentives not to share information with the smart containers. The experiments give rise to several directions for follow-up research, in particular the interaction between smart containers and transport services in self-organizing logistics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题