平均奖励的不安匪徒的基于Whittle Index的Q学习

论文标题

平均奖励的不安匪徒的基于Whittle Index的Q学习

Whittle index based Q-learning for restless bandits with average reward

论文作者

Avrachenkov, Konstantin E., Borkar, Vivek S.

论文摘要

使用Q-Learning和Whittle Index的范式引入了一种新颖的增强学习算法，该算法是针对平均奖励的多型不安强盗的。具体而言，我们利用Whittle索引策略的结构来减少Q学习的搜索空间，从而导致重大计算增长。提供了严格的合并分析，并由数值实验支持。数值实验显示了拟议方案的出色经验表现。

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.

下载PDF全文

下载文献需遵守相关版权规定

论文标题