论文标题

阈值拉索匪

Thresholded Lasso Bandit

论文作者

Ariu, Kaito, Abe, Kenshi, Proutière, Alexandre

论文摘要

在本文中,我们在稀疏的随机上下文线性匪徒中重新审视了遗憾的最小化问题,其中特征向量可能具有较大的尺寸$ d $,但是奖励功能取决于一些,例如$ s_0 \ ll d $,仅这些功能。我们提出了阈值套索匪徒,该算法(i)估算了定义奖励函数及其稀疏支持的向量及其稀疏支撑,即重要特征元素,使用带有阈值的LASSO框架,(ii)根据支持其支持的估算值选择了手臂。该算法不需要对稀疏指数$ s_0 $的先验知识,并且在某些对称假设下可以不含参数。对于这个简单的算法,我们通常将缩放为$ \ Mathcal {o}(\ log d + d + \ sqrt {t})$,在所谓的条件下($ log D + d + log t)$中为$ \ nog d + sqrt {t} $以前的算法的遗憾分别为$ \ MATHCAL {O}(\ log D + \ \ sqrt {t \ log(d t)})$和$ \ Mathcal {o}(\ log log t \ log t \ log t \ log t \ log t \ log t \ log t \ log d)$。通过数值实验,我们确认我们的算法优于现有方法。

In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only. We present Thresholded Lasso bandit, an algorithm that (i) estimates the vector defining the reward function as well as its sparse support, i.e., significant feature elements, using the Lasso framework with thresholding, and (ii) selects an arm greedily according to this estimate projected on its support. The algorithm does not require prior knowledge of the sparsity index $s_0$ and can be parameter-free under some symmetric assumptions. For this simple algorithm, we establish non-asymptotic regret upper bounds scaling as $\mathcal{O}( \log d + \sqrt{T} )$ in general, and as $\mathcal{O}( \log d + \log T)$ under the so-called margin condition (a probabilistic condition on the separation of the arm rewards). The regret of previous algorithms scales as $\mathcal{O}( \log d + \sqrt{T \log (d T)})$ and $\mathcal{O}( \log T \log d)$ in the two settings, respectively. Through numerical experiments, we confirm that our algorithm outperforms existing methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源