尖峰神经网络学习马尔可夫链

论文标题

尖峰神经网络学习马尔可夫链

A Spiking Neural Network Learning Markov Chain

论文作者

Kiselev, Mikhail

论文摘要

在本文中，尖峰神经网络（SNN）如何在其内部结构中学习和修复了外部世界动态模型。这个问题对于实施基于模型的强化学习（RL）很重要，这是现实的RL制度，在该制度中，SNN做出的决定及其在奖励/惩罚信号方面的评估可能会通过大量的时间间隔和中间评估与中性世界国家的大量时间间隔和顺序分开。在目前的工作中，我将世界动态形式化为马尔可夫连锁店，具有未知的先验状态过渡概率，应该由网络学习。为了使这个问题的表述更加现实，我可以在连续的时间内解决它，以便马尔可夫链中每个状态的持续时间都不同，并且未知。已经证明了如何通过具有专门设计的结构和局部突触可塑性规则的SNN来完成此任务。例如，我们展示了这个网络图案如何在简单但非平凡的世界中起作用，在这个世界中，球在方形盒子内移动并以随机的新方向和速度从墙壁中弹起。

In this paper, the question how spiking neural network (SNN) learns and fixes in its internal structures a model of external world dynamics is explored. This question is important for implementation of the model-based reinforcement learning (RL), the realistic RL regime where the decisions made by SNN and their evaluation in terms of reward/punishment signals may be separated by significant time interval and sequence of intermediate evaluation-neutral world states. In the present work, I formalize world dynamics as a Markov chain with unknown a priori state transition probabilities, which should be learnt by the network. To make this problem formulation more realistic, I solve it in continuous time, so that duration of every state in the Markov chain may be different and is unknown. It is demonstrated how this task can be accomplished by an SNN with specially designed structure and local synaptic plasticity rules. As an example, we show how this network motif works in the simple but non-trivial world where a ball moves inside a square box and bounces from its walls with a random new direction and velocity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题