蒙特卡洛树搜索中的第二种不确定性

论文标题

蒙特卡洛树搜索中的第二种不确定性

The Second Type of Uncertainty in Monte Carlo Tree Search

论文作者

Moerland, Thomas M, Broekens, Joost, Plaat, Aske, Jonker, Catholijn M

论文摘要

蒙特卡洛树搜索（MCTS）有效地平衡了基于计数衍生的不确定性在树搜索中的探索和剥削。但是，这些本地访问数量忽略了第二种类型的不确定性，这是由子树的大小低于动作的。我们首先显示，由于缺乏第二种不确定性类型，MCT在众所周知的稀疏探索问题中可能完全失败，这是从强化学习社区中知道的。然后，我们引入了一种新的算法，该算法估计了一个动作下的子树的大小，并利用UCB公式中的这些信息来更好地直接探索。随后，我们通过表明循环（即相同痕迹中相同状态的重复发生）的循环来概括这些想法，实际上是子树深度变化的一种特殊情况。对各种任务进行测试表明，我们的算法提高了样本效率，尤其是当每个时间段的计划预算很小时。

Monte Carlo Tree Search (MCTS) efficiently balances exploration and exploitation in tree search based on count-derived uncertainty. However, these local visit counts ignore a second type of uncertainty induced by the size of the subtree below an action. We first show how, due to the lack of this second uncertainty type, MCTS may completely fail in well-known sparse exploration problems, known from the reinforcement learning community. We then introduce a new algorithm, which estimates the size of the subtree below an action, and leverages this information in the UCB formula to better direct exploration. Subsequently, we generalize these ideas by showing that loops, i.e., the repeated occurrence of (approximately) the same state in the same trace, are actually a special case of subtree depth variation. Testing on a variety of tasks shows that our algorithms increase sample efficiency, especially when the planning budget per timestep is small.

下载PDF全文

下载文献需遵守相关版权规定

论文标题