论文标题
改善了零订单对抗性匪徒凸优化的遗憾
Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation
论文作者
论文摘要
我们证明,零级遗憾的信息理论上界的零级遗憾最多是$ o(d^{2.5} \ sqrt {n} \ log(n))$,其中$ d $是尺寸,$ n $是交互的数量。这在$ o(d^{9.5} \ sqrt {n} \ log(n)^{7.5} $(2017)(2017年)上。
We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.