学习在没有知识的情况下玩两人的完美信息游戏

论文标题

学习在没有知识的情况下玩两人的完美信息游戏

Learning to Play Two-Player Perfect-Information Games without Knowledge

论文作者

Cohen-Solal, Quentin

论文摘要

在本文中，提出了通过加强学习游戏状态评估功能的几种技术。第一个是对树自举的概括（树学习）：它适用于强化学习的背景，而无需基于非线性功能知识。有了这项技术，在加强学习过程中不会丢失任何信息。第二个是对无界深度的minimax的修改，将最佳作用序列扩展到末端状态。此修改后的搜索旨在在学习过程中使用。第三个是用增强型启发式方法代替游戏（+1 / -1）的经典增长。我们研究特殊的强化持续操作法，例如：快速获胜和缓慢的失败；得分；流动性或存在。这四个是一个新的动作选择分布。进行的实验表明，这些技术提高了比赛水平。最后，我们将这些不同的技术应用于“十六进制”游戏（11号和13号）的游戏中，从而超过了Mohex 3HNN的水平，而无需在没有知识的情况下从自我玩法中学习。

In this paper, several techniques for learning game state evaluation functions by reinforcement are proposed. The first is a generalization of tree bootstrapping (tree learning): it is adapted to the context of reinforcement learning without knowledge based on non-linear functions. With this technique, no information is lost during the reinforcement learning process. The second is a modification of minimax with unbounded depth extending the best sequences of actions to the terminal states. This modified search is intended to be used during the learning process. The third is to replace the classic gain of a game (+1 / -1) with a reinforcement heuristic. We study particular reinforcement heuristics such as: quick wins and slow defeats ; scoring ; mobility or presence. The four is a new action selection distribution. The conducted experiments suggest that these techniques improve the level of play. Finally, we apply these different techniques to design program-players to the game of Hex (size 11 and 13) surpassing the level of Mohex 3HNN with reinforcement learning from self-play without knowledge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题