论文标题
随机多臂匪徒具有线性动力学系统产生的非平稳奖励
Stochastic Multi-armed Bandits with Non-stationary Rewards Generated by a Linear Dynamical System
论文作者
论文摘要
随机的多军匪徒为研究未知环境中的决策提供了一个框架。我们提出了一个随机多臂匪徒的变体,其中从随机线性动力学系统中抽样奖励。这种随机多臂匪徒变体的提出的策略是学习动力系统的模型,同时根据学习模型选择最佳动作。由Fernholz提出的数学金融领域(例如由Merton提出的诸如Interporal Capital Asset定价模型)提出的,Fernholz提出的,这两种模型资产都具有随机微分方程的返回,此策略适用于定量融资作为高频交易策略,在该策略中,该策略是最大程度地返回的,以最大程度地返回时间段。
The stochastic multi-armed bandit has provided a framework for studying decision-making in unknown environments. We propose a variant of the stochastic multi-armed bandit where the rewards are sampled from a stochastic linear dynamical system. The proposed strategy for this stochastic multi-armed bandit variant is to learn a model of the dynamical system while choosing the optimal action based on the learned model. Motivated by mathematical finance areas such as Intertemporal Capital Asset Pricing Model proposed by Merton and Stochastic Portfolio Theory proposed by Fernholz that both model asset returns with stochastic differential equations, this strategy is applied to quantitative finance as a high-frequency trading strategy, where the goal is to maximize returns within a time period.