论文标题

随机动态编程,非线性折扣

Stochastic dynamic programming with non-linear discounting

论文作者

Bäuerle, Nicole, Jaśkiewicz, Anna, Nowak, Andrzej S.

论文摘要

在本文中,我们研究了马尔可夫决策过程,具有非线性折扣功能和Borel状态空间。我们定义了递归折扣的实用程序,该效用类似于经济学的许多模型中考虑的非加工效用功能。此处的非添加性来自折扣功能的非线性。我们的研究与Jaśkiewicz,Matkowski和Nowak的工作相辅相成(Math。plot。s.38(2013),108-121),在随机环境中也使用了非线性折扣,但是在所有过程中汇总了实用程序的期望,可以应用于所有组织的动态模型。我们的目的是证明,在递归折现的实用程序案例中,Bellman方程有解决方案,并且在无限时间范围内存在最佳的平稳政策。我们的方法包括两种情况:$(a)$当双方一阶段实用程序以重量函数乘以某些正和负常数乘以$(b)$时。

In this paper, we study a Markov decision process with a non-linear discount function and with a Borel state space. We define a recursive discounted utility, which resembles non-additive utility functions considered in a number of models in economics. Non-additivity here follows from non-linearity of the discount function. Our study is complementary to the work of Jaśkiewicz, Matkowski and Nowak (Math. Oper. Res. 38 (2013), 108-121), where also non-linear discounting is used in the stochastic setting, but the expectation of utilities aggregated on the space of all histories of the process is applied leading to a non-stationary dynamic programming model. Our aim is to prove that in the recursive discounted utility case the Bellman equation has a solution and there exists an optimal stationary policy for the problem in the infinite time horizon. Our approach includes two cases: $(a)$ when the one-stage utility is bounded on both sides by a weight function multiplied by some positive and negative constants, and $(b)$ when the one-stage utility is unbounded from below.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源