通过各种因果推理概括目标条件的强化学习

论文标题

通过各种因果推理概括目标条件的强化学习

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

论文作者

Ding, Wenhao, Lin, Haohong, Li, Bo, Zhao, Ding

论文摘要

作为在人类智能中获得可推广的解决方案的关键组成部分，推理为加强学习（RL）代理人对各种目标的概括提供了巨大的潜力，这是通过汇总部分到全部的论点并发现因果关系的。但是，如何发现和代表因果关系仍然是阻碍因果RL发展的巨大差距。在本文中，我们使用因果图（CG）增强了目标条件的RL（GCRL），该结构是建立在对象和事件之间的关系的结构。我们将小新生将GCRL问题提出为变异可能性最大化，将CG作为潜在变量。为了优化派生目标，我们提出了一个具有理论性能的框架，可以确保在两个步骤之间交替：使用介入数据来估计CG的后验；使用CG学习可推广的模型和可解释的政策。由于缺乏在推理下验证概括能力的公共基准，我们设计了九个任务，然后凭经验显示了针对这些任务的五个基准的拟议方法的有效性。进一步的理论分析表明，我们的绩效提高归因于因果发现，过渡建模和政策培训的良性循环，这与广泛消融研究中的实验证据保持一致。

As a pivotal component to attaining generalizable solutions in human intelligence, reasoning provides great potential for reinforcement learning (RL) agents' generalization towards varied goals by summarizing part-to-whole arguments and discovering cause-and-effect relations. However, how to discover and represent causalities remains a huge gap that hinders the development of causal RL. In this paper, we augment Goal-Conditioned RL (GCRL) with Causal Graph (CG), a structure built upon the relation between objects and events. We novelly formulate the GCRL problem into variational likelihood maximization with CG as latent variables. To optimize the derived objective, we propose a framework with theoretical performance guarantees that alternates between two steps: using interventional data to estimate the posterior of CG; using CG to learn generalizable models and interpretable policies. Due to the lack of public benchmarks that verify generalization capability under reasoning, we design nine tasks and then empirically show the effectiveness of the proposed method against five baselines on these tasks. Further theoretical analysis shows that our performance improvement is attributed to the virtuous cycle of causal discovery, transition modeling, and policy training, which aligns with the experimental evidence in extensive ablation studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题