使用国家行动预测自组织地图学习直观的物理学和一次性模仿

论文标题

使用国家行动预测自组织地图学习直观的物理学和一次性模仿

Learning intuitive physics and one-shot imitation using state-action-prediction self-organizing maps

论文作者

Stetter, Martin, Lang, Elmar W.

论文摘要

人类的学习和智力的工作与大多数深度学习体系结构中采用的监督模式识别方法不同。人类似乎通过探索和模仿，建立世界因果模型来学习丰富的代表，并使用两者灵活地解决新任务。我们建议一个简单但有效的无监督模型，该模型发展出这种特征。代理商学会通过内在动机的探索来表示其环境的动态物理特性，并对此表示形式进行推断以达到目标。为此，一组代表状态行动对的自组织图与序列预测的因果模型相结合。在Cartpole环境中评估所提出的系统。经过嬉戏探索的初始阶段，代理可以执行对环境未来的运动学模拟，并将其用于行动计划。我们在一组相关但不同的单次模仿任务上演示了其性能，代理商以主动推理样式灵活地求解。

Human learning and intelligence work differently from the supervised pattern recognition approach adopted in most deep learning architectures. Humans seem to learn rich representations by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks. We suggest a simple but effective unsupervised model which develops such characteristics. The agent learns to represent the dynamical physical properties of its environment by intrinsically motivated exploration, and performs inference on this representation to reach goals. For this, a set of self-organizing maps which represent state-action pairs is combined with a causal model for sequence prediction. The proposed system is evaluated in the cartpole environment. After an initial phase of playful exploration, the agent can execute kinematic simulations of the environment's future, and use those for action planning. We demonstrate its performance on a set of several related, but different one-shot imitation tasks, which the agent flexibly solves in an active inference style.

下载PDF全文

下载文献需遵守相关版权规定

论文标题