带有侧观察的上下文匪徒

论文标题

带有侧观察的上下文匪徒

Contextual Bandits with Side-Observations

论文作者

Singh, Rahul, Liu, Fang, Liu, Xin, Shroff, Ness

论文摘要

我们在跨武器的副观察存在下研究上下文匪徒，以设计通过社交网络连接的用户设计推荐算法。社交网络中的用户会响应朋友的活动，因此提供了有关彼此偏好的信息。在我们的模型中，当学习算法向用户推荐一篇文章时，它不仅可以观察到他/她的响应（例如，单击广告），而且还可以观察到副观察，即，如果他们呈现了同一篇文章，则他们的邻居响应。我们通过图$ \ Mathcal {g} $对这些观察依赖关系进行建模，其中节点与用户相对应，而边缘对应于社交链接。我们在任何一致的算法的遗憾中得出了问题/实例依赖性下限。我们提出了一种基于数据驱动的学习算法的优化（线性编程），该学习算法利用$ \ Mathcal {g} $的结构，以向用户提出建议并表明它在某种意义上是渐近的最佳选择，因为它的遗憾是遗憾的是，其较低的人数匹配了较低的回合$ t \ t \ t \ foffty $。我们表明，这种渐近最佳的遗憾是$ o \ left（|χ（\ Mathcal {g}）| \ log t t \ right）$，其中$ |χ（\ Mathcal {g}）| $是$ \ MATHCAL {G} $的支配数。相比之下，现有学习算法的幼稚应用导致$ o \ left（n \ log t \ right）$遗憾，其中$ n $是用户的数量。

We investigate contextual bandits in the presence of side-observations across arms in order to design recommendation algorithms for users connected via social networks. Users in social networks respond to their friends' activity, and hence provide information about each other's preferences. In our model, when a learning algorithm recommends an article to a user, not only does it observe his/her response (e.g. an ad click), but also the side-observations, i.e., the response of his neighbors if they were presented with the same article. We model these observation dependencies by a graph $\mathcal{G}$ in which nodes correspond to users, and edges correspond to social links. We derive a problem/instance-dependent lower-bound on the regret of any consistent algorithm. We propose an optimization (linear programming) based data-driven learning algorithm that utilizes the structure of $\mathcal{G}$ in order to make recommendations to users and show that it is asymptotically optimal, in the sense that its regret matches the lower-bound as the number of rounds $T\to\infty$. We show that this asymptotically optimal regret is upper-bounded as $O\left(|χ(\mathcal{G})|\log T\right)$, where $|χ(\mathcal{G})|$ is the domination number of $\mathcal{G}$. In contrast, a naive application of the existing learning algorithms results in $O\left(N\log T\right)$ regret, where $N$ is the number of users.

下载PDF全文

下载文献需遵守相关版权规定

论文标题