蒙特卡洛树搜索中的贝叶斯优化用于向后流域

论文标题

蒙特卡洛树搜索中的贝叶斯优化用于向后流域

Bayesian optimization for backpropagation in Monte-Carlo tree search

论文作者

Li, Yueqin, Lim, Nengli

论文摘要

在大域中，需要蒙特卡洛树搜索（MCT）来尽可能高效，准确地估算状态的值。但是，反向传播中的标准更新规则假定回报的静止分布，尤其是在最小树木中，由于平均值，融合到真实值可能会很慢。我们提出了两种方法，即SoftMax MCT和单调MCT，它们概括了以前的尝试改进反向传播策略的尝试。我们证明，这两种方法都可以减少找到最佳单调函数，这是通过先前使用高斯过程（GP）进行贝叶斯优化来做到的。我们在计算机GO上进行实验，其中回报由深度价值神经网络给出，并表明我们提出的框架的表现优于先前的方法。

In large domains, Monte-Carlo tree search (MCTS) is required to estimate the values of the states as efficiently and accurately as possible. However, the standard update rule in backpropagation assumes a stationary distribution for the returns, and particularly in min-max trees, convergence to the true value can be slow because of averaging. We present two methods, Softmax MCTS and Monotone MCTS, which generalize previous attempts to improve upon the backpropagation strategy. We demonstrate that both methods reduce to finding optimal monotone functions, which we do so by performing Bayesian optimization with a Gaussian process (GP) prior. We conduct experiments on computer Go, where the returns are given by a deep value neural network, and show that our proposed framework outperforms previous methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题