通过进化替代辅助处方的有效强化学习

论文标题

通过进化替代辅助处方的有效强化学习

Effective Reinforcement Learning through Evolutionary Surrogate-Assisted Prescription

论文作者

Francon, Olivier, Gonzalez, Santiago, Hodjat, Babak, Meyerson, Elliot, Miikkulainen, Risto, Qiu, Xin, Shahrzad, Hormoz

论文摘要

现在有有关组织中决策的重要历史数据，包括决策问题，做出了哪些决策以及结果多么可取。使用这些数据，可以学习一个替代模型，并通过该模型发展出优化结果的决策策略。本文介绍了一种一般的方法，称为进化替代辅助处方或尤其是。例如，替代物是一个随机森林或接受梯度下降训练的神经网络，该策略是一种神经网络，旨在最大程度地提高替代模型的预测。 ESP在本文中进一步扩展到顺序决策任务，这使得评估增强学习（RL）基准的框架成为可能。由于大多数评估是对替代物进行的，因此与标准RL方法相比，ESP更有效，差异较低，后悔较低。令人惊讶的是，它的解决方案也更好，因为代理和策略网络都规范决策行为。因此，ESP构成了现实世界中决策优化的有前途的基础。

There is now significant historical data available on decision making in organizations, consisting of the decision problem, what decisions were made, and how desirable the outcomes were. Using this data, it is possible to learn a surrogate model, and with that model, evolve a decision strategy that optimizes the outcomes. This paper introduces a general such approach, called Evolutionary Surrogate-Assisted Prescription, or ESP. The surrogate is, for example, a random forest or a neural network trained with gradient descent, and the strategy is a neural network that is evolved to maximize the predictions of the surrogate model. ESP is further extended in this paper to sequential decision-making tasks, which makes it possible to evaluate the framework in reinforcement learning (RL) benchmarks. Because the majority of evaluations are done on the surrogate, ESP is more sample efficient, has lower variance, and lower regret than standard RL approaches. Surprisingly, its solutions are also better because both the surrogate and the strategy network regularize the decision-making behavior. ESP thus forms a promising foundation to decision optimization in real-world problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题