论文标题
通过深厚的强化学习,主动防御航天飞机的合作指导策略与不完美的信息
Cooperative Guidance Strategy for Active Defense Spacecraft with Imperfect Information via Deep Reinforcement Learning
论文作者
论文摘要
在本文中,制定了试图逃避拦截器的目标航天器的自适应合作指导策略。目标航天器执行回避操作,发射了主动防御工具来转移拦截器。该问题不是基于最佳控制或差异游戏理论的经典策略,而是使用深入的强化学习方法解决了该问题,并且为拦截器的可操作性假定了不完美的信息。为了解决稀疏的奖励问题,使用塑造技术提出了一种通用的奖励设计方法和日益困难的培训方法。通过学习过程和蒙特卡洛模拟证明了指导法,奖励功能和培训方法。非SPARSE奖励功能和日益困难的训练方法的应用加速了模型的收敛,从而减轻了过度拟合的问题。将标准的最佳指导法作为基准,有效性和优势保证了目标航天器在多代理游戏中的逃脱和获胜率,并通过拟议的指导策略得到了模拟结果的验证。受过训练的代理人对拦截器机动性的适应性优于最佳指导法。此外,与标准最佳指导法相比,提出的指导策略的性能较低,而先验知识较少。
In this paper, an adaptive cooperative guidance strategy for the active protection of a target spacecraft trying to evade an interceptor was developed. The target spacecraft performs evasive maneuvers, launching an active defense vehicle to divert the interceptor. Instead of classical strategies, which are based on optimal control or differential game theory, the problem was solved by using the deep reinforcement learning method, and imperfect information was assumed for the interceptor maneuverability. To address the sparse reward problem, a universal reward design method and an increasingly difficult training approach were presented utilizing the shaping technique. Guidance law, reward function, and training approach were demonstrated through the learning process and Monte Carlo simulations. The application of the non-sparse reward function and increasingly difficult training approach accelerated the model convergence, alleviating the overfitting problem. Considering a standard optimal guidance law as a benchmark, the effectiveness, and the advantages, that guarantee the target spacecraft's escape and win rates in a multi-agent game, of the proposed guidance strategy were validated by the simulation results. The trained agent's adaptiveness to the interceptor maneuverability was superior to the optimal guidance law. Moreover, compared to the standard optimal guidance law, the proposed guidance strategy performed better with less prior knowledge.