离线监督学习V.S.在线直接政策优化：基于神经网络的最佳反馈控制的比较研究和统一的培训范例

论文标题

离线监督学习V.S.在线直接政策优化：基于神经网络的最佳反馈控制的比较研究和统一的培训范例

Offline Supervised Learning V.S. Online Direct Policy Optimization: A Comparative Study and A Unified Training Paradigm for Neural Network-Based Optimal Feedback Control

论文作者

Zhao, Yue, Han, Jiequn

论文摘要

这项工作与解决基于神经网络的反馈控制器有效地解决最佳控制问题有关。我们首先对两种普遍的方法进行了比较研究：离线监督学习和在线直接政策优化。尽管监督学习方法的训练部分相对容易，但该方法的成功在很大程度上取决于开环最佳控制求解器生成的最佳控制数据集。相比之下，直接策略优化将最佳控制问题直接转化为一个优化问题，而无需进行预先计算，但是当问题复杂时，与动态相关的目标可能很难优化。我们的结果强调了离线监督学习在最佳和培训时间方面的优势。为了克服主要挑战，数据集和优化分别在两种方法中，我们对它们进行补充，并提出预训练和微调策略作为最佳反馈控制的统一培训范式，从而进一步改善了性能和鲁棒性。我们的代码可在https://github.com/yzhao98/deepoptimalcontrol上访问。

This work is concerned with solving neural network-based feedback controllers efficiently for optimal control problems. We first conduct a comparative study of two prevalent approaches: offline supervised learning and online direct policy optimization. Albeit the training part of the supervised learning approach is relatively easy, the success of the method heavily depends on the optimal control dataset generated by open-loop optimal control solvers. In contrast, direct policy optimization turns the optimal control problem into an optimization problem directly without any requirement of pre-computing, but the dynamics-related objective can be hard to optimize when the problem is complicated. Our results underscore the superiority of offline supervised learning in terms of both optimality and training time. To overcome the main challenges, dataset and optimization, in the two approaches respectively, we complement them and propose the Pre-train and Fine-tune strategy as a unified training paradigm for optimal feedback control, which further improves the performance and robustness significantly. Our code is accessible at https://github.com/yzhao98/DeepOptimalControl.

下载PDF全文

下载文献需遵守相关版权规定

论文标题