论文标题
概率保证安全的深入增强学习
Probabilistic Guarantees for Safe Deep Reinforcement Learning
论文作者
论文摘要
深厚的加强学习已成功地应用于许多控制任务,但是由于安全问题,此类代理在安全至关重要的情况下的应用受到限制。这些控制器的严格测试是具有挑战性的,尤其是当它们在概率环境中操作时,由于硬件故障或嘈杂的传感器。我们提出了Mosaic,这是一种用于测量随机环境中深钢筋学习剂安全性的算法。我们的方法是基于迭代构建控制器在环境中执行的正式抽象,并利用马尔可夫决策过程的概率模型检查在有限的时间范围内对安全行为产生概率保证。它在控制器对不同初始配置的安全操作的可能性上产生界限,并确定可以保证正确行为的区域。我们对针对多个基准控制问题的培训的代理人实施并评估我们的方法。
Deep reinforcement learning has been successfully applied to many control tasks, but the application of such agents in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning agents in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller's execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on agents trained for several benchmark control problems.