论文标题
超出严格划时环境的量子可访问的强化学习
Quantum-accessible reinforcement learning beyond strictly epochal environments
论文作者
论文摘要
近年来,量子增强的机器学习已成为量子算法的特别富有成果的应用,涵盖了受监督,无监督和强化学习的各个方面。从量子的角度来看,强化学习提供了多种选择,可以说是如何应用量子理论,并且可以说是最少的探索。在这里,代理探索环境,并试图找到一种优化某些功绩的行为。一些第一个方法研究了可以通过考虑经典环境的量子类似物来调查这种探索的设置,然后可以在叠加中查询这些环境。如果环境有严格的周期性结构(即严格为情节),则可以将这些环境有效地转换为量子信息中遇到的常规甲骨文。但是,在一般环境中,我们获得了标准甲骨文任务的方案。在这项工作中,我们考虑了一个这样的概括,其中环境不是严格的情节,它映射到了带有甲骨文变化的甲骨文识别设置。我们分析了此案例,并表明标准振幅扩大技术可以通过较小的修改仍用于实现二次加速,并且这种方法对于某些设置是最佳的。这结果构成了量子可访问的增强学习的第一个概括之一。
In recent years, quantum-enhanced machine learning has emerged as a particularly fruitful application of quantum algorithms, covering aspects of supervised, unsupervised and reinforcement learning. Reinforcement learning offers numerous options of how quantum theory can be applied, and is arguably the least explored, from a quantum perspective. Here, an agent explores an environment and tries to find a behavior optimizing some figure of merit. Some of the first approaches investigated settings where this exploration can be sped-up, by considering quantum analogs of classical environments, which can then be queried in superposition. If the environments have a strict periodic structure in time (i.e. are strictly episodic), such environments can be effectively converted to conventional oracles encountered in quantum information. However, in general environments, we obtain scenarios that generalize standard oracle tasks. In this work we consider one such generalization, where the environment is not strictly episodic, which is mapped to an oracle identification setting with a changing oracle. We analyze this case and show that standard amplitude-amplification techniques can, with minor modifications, still be applied to achieve quadratic speed-ups, and that this approach is optimal for certain settings. This results constitutes one of the first generalizations of quantum-accessible reinforcement learning.