论文标题
艰苦探索问题的自适应程序生成
Adaptive Procedural Task Generation for Hard-Exploration Problems
论文作者
论文摘要
我们引入了自适应程序任务生成(APT-GEN),一种方法是逐步生成一系列任务作为课程,以促进艰苦探索问题中的强化学习。从我们的方法的核心中,任务生成器学会通过黑盒程序生成模块从参数化的任务空间创建任务。为了在没有学习进度的直接指标的情况下启用课程学习,我们建议通过平衡代理在生成的任务中的绩效以及与目标任务的相似性来训练任务生成器。通过对抗训练,任务相似性是由根据代理经验定义的任务歧视器自适应估计的,从而允许生成的任务近似于未知参数化的目标任务或预定义的任务空间的外部。我们对网格世界和机器人操纵任务域进行的实验表明,APT-GEN通过产生丰富的变化的合适任务来实现比各种现有基线的性能要好得多。
We introduce Adaptive Procedural Task Generation (APT-Gen), an approach to progressively generate a sequence of tasks as curricula to facilitate reinforcement learning in hard-exploration problems. At the heart of our approach, a task generator learns to create tasks from a parameterized task space via a black-box procedural generation module. To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks. Through adversarial training, the task similarity is adaptively estimated by a task discriminator defined on the agent's experiences, allowing the generated tasks to approximate target tasks of unknown parameterization or outside of the predefined task space. Our experiments on the grid world and robotic manipulation task domains show that APT-Gen achieves substantially better performance than various existing baselines by generating suitable tasks of rich variations.