通过深度加固学习的无线数据收集的多路径路径计划

论文标题

通过深度加固学习的无线数据收集的多路径路径计划

Multi-UAV Path Planning for Wireless Data Harvesting with Deep Reinforcement Learning

论文作者

Bayerlein, Harald, Theile, Mirco, Caccamo, Marco, Gesbert, David

论文摘要

从分布式物联网（IoT）设备的分布式数据收集数据具有多个自动无人驾驶汽车（UAV）是一个具有挑战性的问题，需要灵活的路径计划方法。我们提出了一种多代理增强学习（MARL）方法，与以前的工作相比，可以适应定义数据收集任务的场景参数的深刻变化，例如部署的无人机数量，IOT设备的数量，位置和数据量的数量，或最大的飞行时间，而无需执行昂贵的Reactuctations或Reforearn Controln Controrn Policies。我们为一个合作，不传统和同质的无人机团队制定了路径计划问题，该团队负责最大程度地利用分布式IoT传感器节点收集的数据，但要受到飞行时间和避免碰撞限制的约束。路径计划问题被转化为一个分散的部分可观察到的马尔可夫决策过程（DEC-POMDP），我们通过深入的强化学习（DRL）方法来解决，近似于最佳的无人机控制策略，而没有事先了解密集城市环境中具有挑战性的无线通道特征。通过利用供应到代理的卷积层的环境中集中的全球和本地地图表示，我们表明，我们提出的网络体系结构可以通过将数据收集任务彼此划分，适应大型复杂环境和状态空间，并使数据收集目标收集目标，飞行时间效率和导航构成平衡。最后，学习一项控制策略，该策略在方案参数空间中进行了概括，使我们能够分析单个参数对收集性能的影响，并提供有关系统级别收益的直觉。

Harvesting data from distributed Internet of Things (IoT) devices with multiple autonomous unmanned aerial vehicles (UAVs) is a challenging problem requiring flexible path planning methods. We propose a multi-agent reinforcement learning (MARL) approach that, in contrast to previous work, can adapt to profound changes in the scenario parameters defining the data harvesting mission, such as the number of deployed UAVs, number, position and data amount of IoT devices, or the maximum flying time, without the need to perform expensive recomputations or relearn control policies. We formulate the path planning problem for a cooperative, non-communicating, and homogeneous team of UAVs tasked with maximizing collected data from distributed IoT sensor nodes subject to flying time and collision avoidance constraints. The path planning problem is translated into a decentralized partially observable Markov decision process (Dec-POMDP), which we solve through a deep reinforcement learning (DRL) approach, approximating the optimal UAV control policy without prior knowledge of the challenging wireless channel characteristics in dense urban environments. By exploiting a combination of centered global and local map representations of the environment that are fed into convolutional layers of the agents, we show that our proposed network architecture enables the agents to cooperate effectively by carefully dividing the data collection task among themselves, adapt to large complex environments and state spaces, and make movement decisions that balance data collection goals, flight-time efficiency, and navigation constraints. Finally, learning a control policy that generalizes over the scenario parameter space enables us to analyze the influence of individual parameters on collection performance and provide some intuition about system-level benefits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题