论文标题
基于动态汤普森采样的自适应操作员选择MOEA/D
Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D
论文作者
论文摘要
在进化计算中,不同的繁殖操作员具有各种搜索动态。为了在勘探和剥削之间取得良好的平衡,拥有自适应操作员选择(AOS)机制很有吸引力,该机制自动根据当前状态自动选择最合适的操作员。本文提出了一种基于分解(MOEA/D)的多目标进化算法的新的AOS机制。更具体地说,AOS被配制为多臂匪徒问题,其中使用动态汤普森采样(DYTS)适用于最初提出的固定奖励分布的建议,以适应匪徒学习模型。特别是,我们的强盗学习模型的每个臂都代表一个复制操作员,并以先前的奖励分配分配。这些奖励分布的参数将根据其从进化过程中收集的性能的性能逐渐更新。在产生后代时,通过根据Dyts从这些奖励分布中取样来选择操作员。实验结果与其他四个最先进的MOEA/D变体相比,实验结果完全证明了我们提出的AOS机制的有效性和竞争力。
In evolutionary computation, different reproduction operators have various search dynamics. To strike a well balance between exploration and exploitation, it is attractive to have an adaptive operator selection (AOS) mechanism that automatically chooses the most appropriate operator on the fly according to the current status. This paper proposes a new AOS mechanism for multi-objective evolutionary algorithm based on decomposition (MOEA/D). More specifically, the AOS is formulated as a multi-armed bandit problem where the dynamic Thompson sampling (DYTS) is applied to adapt the bandit learning model, originally proposed with an assumption of a fixed award distribution, to a non-stationary setup. In particular, each arm of our bandit learning model represents a reproduction operator and is assigned with a prior reward distribution. The parameters of these reward distributions will be progressively updated according to the performance of its performance collected from the evolutionary process. When generating an offspring, an operator is chosen by sampling from those reward distribution according to the DYTS. Experimental results fully demonstrate the effectiveness and competitiveness of our proposed AOS mechanism compared with other four state-of-the-art MOEA/D variants.