改善了零订单对抗性匪徒凸优化的遗憾

论文标题

改善了零订单对抗性匪徒凸优化的遗憾

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

论文作者

Lattimore, Tor

论文摘要

我们证明，零级遗憾的信息理论上界的零级遗憾最多是$ o（d^{2.5} \ sqrt {n} \ log（n））$，其中$ d $是尺寸，$ n $是交互的数量。这在$ o（d^{9.5} \ sqrt {n} \ log（n）^{7.5} $（2017）（2017年）上。

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题