输出反馈管MPC指导的数据增强，以实现强大的，有效的感觉运动策略学习

论文标题

输出反馈管MPC指导的数据增强，以实现强大的，有效的感觉运动策略学习

Output Feedback Tube MPC-Guided Data Augmentation for Robust, Efficient Sensorimotor Policy Learning

论文作者

Tagliabue, Andrea, How, Jonathan P.

论文摘要

模仿学习（IL）可以从基于计算昂贵的模型的传感和控制算法提供的演示中生成计算高效的感觉运动策略。但是，通常使用的IL方法通常是数据智能的，需要收集大量的示范并制定对不确定性鲁棒性有限的政策。在这项工作中，我们将IL与输出反馈鲁棒管模型预测控制器（RTMPC）相结合，以共同生成演示和数据增强策略，以有效学习基于神经网络的感觉运动策略。多亏了增强数据，我们减少了IL所需的计算时间和所需的演示数量，同时为感测和处理不确定性提供了鲁棒性。我们定制了学习空中机器人的轨迹跟踪视觉运动策略的任务，并利用环境的3D网格作为数据增强过程的一部分。我们从数值上证明，我们的方法可以从单个演示中学习强大的视觉运动策略 - 与现有的IL方法相比，演示效率的两个数量级提高了。

Imitation learning (IL) can generate computationally efficient sensorimotor policies from demonstrations provided by computationally expensive model-based sensing and control algorithms. However, commonly employed IL methods are often data-inefficient, requiring the collection of a large number of demonstrations and producing policies with limited robustness to uncertainties. In this work, we combine IL with an output feedback robust tube model predictive controller (RTMPC) to co-generate demonstrations and a data augmentation strategy to efficiently learn neural network-based sensorimotor policies. Thanks to the augmented data, we reduce the computation time and the number of demonstrations needed by IL, while providing robustness to sensing and process uncertainty. We tailor our approach to the task of learning a trajectory tracking visuomotor policy for an aerial robot, leveraging a 3D mesh of the environment as part of the data augmentation process. We numerically demonstrate that our method can learn a robust visuomotor policy from a single demonstration--a two-orders of magnitude improvement in demonstration efficiency compared to existing IL methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题