多环境预处理可以转移到动作有限的数据集

论文标题

多环境预处理可以转移到动作有限的数据集

Multi-Environment Pretraining Enables Transfer to Action Limited Datasets

论文作者

Venuto, David, Yang, Sherry, Abbeel, Pieter, Precup, Doina, Mordatch, Igor, Nachum, Ofir

论文摘要

使用大量数据集训练大型模型已成为自然语言和视力应用中广泛概括的主要方法。但是，在强化学习中，一个关键的挑战是，顺序决策的可用数据通常不会用动作注释 - 例如，游戏玩法的视频比与其记录的游戏控件配对的帧序列更可用。我们建议通过从感兴趣的\ emph {target}环境与来自其他各种其他\ emph {source}环境中的完全宣布的数据集相结合的大型但稀疏的数据集来避免这一挑战。我们的方法（Action Limited）预处理（ALPT）利用逆动力学建模（IDM）的概括能力在目标环境中标记缺失的动作数据。我们表明，在IDM预处理过程中，即使使用一个附加的标记数据的环境数据集则在生成未注释序列的动作标签方面有了很大的改进。我们在基准游戏玩法环境上评估了我们的方法，并证明我们可以使用相当于只有$ 12 $分钟的游戏玩法的注释数据集，可以显着提高游戏性能和概括能力。强调IDM的力量，我们表明即使目标环境和源环境没有共同的行动，这些好处仍然存在。

Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and vision applications. In reinforcement learning, however, a key challenge is that available data of sequential decision making is often not annotated with actions - for example, videos of game-play are much more available than sequences of frames paired with their logged game controls. We propose to circumvent this challenge by combining large but sparsely-annotated datasets from a \emph{target} environment of interest with fully-annotated datasets from various other \emph{source} environments. Our method, Action Limited PreTraining (ALPT), leverages the generalization capabilities of inverse dynamics modelling (IDM) to label missing action data in the target environment. We show that utilizing even one additional environment dataset of labelled data during IDM pretraining gives rise to substantial improvements in generating action labels for unannotated sequences. We evaluate our method on benchmark game-playing environments and show that we can significantly improve game performance and generalization capability compared to other approaches, using annotated datasets equivalent to only $12$ minutes of gameplay. Highlighting the power of IDM, we show that these benefits remain even when target and source environments share no common actions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题