论文标题
在长马最佳控制中,随机抽样蒙特卡洛树搜索方法用于成本近似
Random-Sampling Monte-Carlo Tree Search Methods for Cost Approximation in Long-Horizon Optimal Control
论文作者
论文摘要
在本文中,我们开发了基于蒙特 - 卡洛的启发式方法,以近似长视术最佳控制问题的目标函数。在这些方法中,为了近似目标函数中的期望运算符,我们将系统状态通过多个轨迹发展到将来,同时在每个时间步长采样噪声干扰,并找到沿所有轨迹的成本的平均值(或加权平均)。我们称这些方法随机抽样 - 多径假设传播或RS -MHP。这些方法(或变体)存在于文献中;但是,文献缺乏有关这些近似策略如何融合的结果。本文在一定程度上填补了这一知识差距。我们从RS-MHP方法中得出成本近似误差的收敛结果,并随着样本量增加而讨论其收敛性(概率)。我们考虑两个案例研究以证明我们方法的有效性 - a)线性二次控制问题; b)无人机路径优化问题。
In this paper, we develop Monte-Carlo based heuristic approaches to approximate the objective function in long horizon optimal control problems. In these approaches, to approximate the expectation operator in the objective function, we evolve the system state over multiple trajectories into the future while sampling the noise disturbances at each time-step, and find the average (or weighted average) of the costs along all the trajectories. We call these methods random sampling - multipath hypothesis propagation or RS-MHP. These methods (or variants) exist in the literature; however, the literature lacks results on how well these approximation strategies converge. This paper fills this knowledge gap to a certain extent. We derive convergence results for the cost approximation error from the RS-MHP methods and discuss their convergence (in probability) as the sample size increases. We consider two case studies to demonstrate the effectiveness of our methods - a) linear quadratic control problem; b) UAV path optimization problem.