论文标题
固定点方程求解对风险敏感的MDP
Fixed-point equations solving Risk-sensitive MDP with constraint
论文作者
论文摘要
没有计算可行的算法可以为有限的地平线风险敏感的马尔可夫决策过程(风险CMDP)问题提供解决方案,即使对于中度地平线的问题也是如此。为了设计相同的目的,我们得出一个定点方程式,因此风险CMDP的最佳策略也是解决方案。我们进一步提供了相当于风险CMDP的两个优化问题。这些配方有助于设计一种融合到最佳政策的全球算法。所提出的算法基于随机重新启动和局部改进步骤,其中局部改进步骤利用了派生的固定点方程的解决方案;随机重新启动确保全局优化。我们还提供数值示例,以说明我们算法对库存控制问题的可行性,并具有对风险敏感的成本和约束。该算法的复杂性仅随时间度线性增长。
There are no computationally feasible algorithms that provide solutions to the finite horizon Risk-sensitive Constrained Markov Decision Process (Risk-CMDP) problem, even for problems with moderate horizon. With an aim to design the same, we derive a fixed-point equation such that the optimal policy of Risk-CMDP is also a solution. We further provide two optimization problems equivalent to the Risk-CMDP. These formulations are instrumental in designing a global algorithm that converges to the optimal policy. The proposed algorithm is based on random restarts and a local improvement step, where the local improvement step utilizes the solution of the derived fixed-point equation; random restarts ensure global optimization. We also provide numerical examples to illustrate the feasibility of our algorithm for inventory control problem with risk-sensitive cost and constraint. The complexity of the algorithm grows only linearly with the time-horizon.