论文标题
通过最佳子集选择,在小动作空间上几乎依赖尺寸无关的稀疏线性匪徒
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
论文作者
论文摘要
我们考虑在高维线性模型下的随机上下文匪徒问题。我们专注于动作空间是有限且随机的情况,每个动作都与随机生成的上下文协变量相关联。该设置找到了基本应用,例如个性化建议,在线广告和个性化医学。但是,由于我们需要平衡探索和剥削,这是非常具有挑战性的。我们建议使用最佳的子集选择方法进行双重增长时期,并估算参数,这在实践中易于实现。此方法实现了$ \ tilde {\ Mathcal {o}}(s \ sqrt {t})$ hearry的可能性很高,这几乎是独立的``obsient''回归模型dimension $ d $。通过使用\ textsc {suplinucb}框架,我们进一步达到了一个更清晰的$ \ tilde {\ mathcal {o}}(\ sqrt {st})$遗憾的是,匹配了低量度的线性线性随机bandit问题的微小较小框架。最后,我们进行了广泛的数值实验,以证明我们算法的适用性和鲁棒性。
We consider the stochastic contextual bandit problem under the high dimensional linear model. We focus on the case where the action space is finite and random, with each action associated with a randomly generated contextual covariate. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. However, it is very challenging as we need to balance exploration and exploitation. We propose doubly growing epochs and estimating the parameter using the best subset selection method, which is easy to implement in practice. This approach achieves $ \tilde{\mathcal{O}}(s\sqrt{T})$ regret with high probability, which is nearly independent in the ``ambient'' regression model dimension $d$. We further attain a sharper $\tilde{\mathcal{O}}(\sqrt{sT})$ regret by using the \textsc{SupLinUCB} framework and match the minimax lower bound of low-dimensional linear stochastic bandit problems. Finally, we conduct extensive numerical experiments to demonstrate the applicability and robustness of our algorithms empirically.