具有可证明有限时间保证的元学习控制算法

论文标题

具有可证明有限时间保证的元学习控制算法

A Meta-Learning Control Algorithm with Provable Finite-Time Guarantees

论文作者

Muthirayan, Deepan, Khargonekar, Pramod

论文摘要

在这项工作中，我们为在迭代控制设置中的在线元学习控制算法提供了可证明的遗憾保证，在每个迭代中，要控制的系统是一个不同且未知的线性确定性系统，在迭代中，控制器的成本是一般的附加成本功能，并且需要控制控制输入，如果违反了违反的成本，则需要进行更多的成本。我们证明（i）算法对控制器的成本和约束违规感到遗憾，即$ o（t^{3/4}）$在持续时间$ t $的一集中，对于满足控制输入控制控制的最佳政策和（ii）对控制者成本和约束策略的平均赔偿的平均策略，这是赔偿$ o（t^{3/4}）$。 $ o（（（1+ \ log（n）/n）t^{3/4}）$带有迭代$ n $的数量，表明在迭代中学习的最坏遗憾会随着更多迭代的经验而不断改善。

In this work we provide provable regret guarantees for an online meta-learning control algorithm in an iterative control setting, where in each iteration the system to be controlled is a linear deterministic system that is different and unknown, the cost for the controller in an iteration is a general additive cost function and the control input is required to be constrained, which if violated incurs an additional cost. We prove (i) that the algorithm achieves a regret for the controller cost and constraint violation that are $O(T^{3/4})$ for an episode of duration $T$ with respect to the best policy that satisfies the control input control constraints and (ii) that the average of the regret for the controller cost and constraint violation with respect to the same policy vary as $O((1+\log(N)/N)T^{3/4})$ with the number of iterations $N$, showing that the worst regret for the learning within an iteration continuously improves with experience of more iterations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题