论文标题
平均奖励的不安匪徒的基于Whittle Index的Q学习
Whittle index based Q-learning for restless bandits with average reward
论文作者
论文摘要
使用Q-Learning和Whittle Index的范式引入了一种新颖的增强学习算法,该算法是针对平均奖励的多型不安强盗的。具体而言,我们利用Whittle索引策略的结构来减少Q学习的搜索空间,从而导致重大计算增长。提供了严格的合并分析,并由数值实验支持。数值实验显示了拟议方案的出色经验表现。
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.