论文标题

洗牌私人线性上下文匪徒

Shuffle Private Linear Contextual Bandits

论文作者

Chowdhury, Sayak Ray, Zhou, Xingyu

论文摘要

差异隐私(DP)最近被介绍给线性上下文匪徒,以正式解决其与参与用户相关的个性化服务(例如,建议)的隐私问题。先前的工作主要关注DP的两个信任模型:中央服务器负责保护用户敏感数据的中心模型,以及(更强的)本地模型,其中需要直接在用户方面保护信息。 However, there remains a fundamental gap in the utility achieved by learning algorithms under these two privacy models, e.g., $\tilde{O}(\sqrt{T})$ regret in the central model as compared to $\tilde{O}(T^{3/4})$ regret in the local model, if all users are unique within a learning horizo​​n $T$.在这项工作中,我们旨在实现比中心模型更强大的信任模型,同时考虑了最近流行的私密模型,而与本地模型相比,遗憾要比本地模型少。我们为在Shuffle Trust模型下的线性上下文匪徒提出了一个通用算法框架,其中存在一个可信赖的用户和中央服务器之间的值得信赖的调整器,该框架在将这些数据随机列为一批用户数据之前,然后将其发送到服务器。然后,我们使用两个特定的洗牌协议实例化此框架:一个依赖于局部机制的隐私放大,另一个依赖于有限规范的向量和矩阵的协议。我们证明,这两种实例化都会导致后悔保证,可以显着改善本地模型的实例,并且如果所有用户都是唯一的,则可能是$ \ tilde {o}(t^{3/5})$的顺序。我们还通过模拟合成数据来验证这种遗憾的行为。最后,在非唯一用户的实际情况下,我们表明我们的私人算法量表的遗憾是$ \ tilde {o}(t^{2/3})$,这与中心模型在这种情况下可以实现的匹配。

Differential privacy (DP) has been recently introduced to linear contextual bandits to formally address the privacy concerns in its associated personalized services to participating users (e.g., recommendations). Prior work largely focus on two trust models of DP: the central model, where a central server is responsible for protecting users sensitive data, and the (stronger) local model, where information needs to be protected directly on user side. However, there remains a fundamental gap in the utility achieved by learning algorithms under these two privacy models, e.g., $\tilde{O}(\sqrt{T})$ regret in the central model as compared to $\tilde{O}(T^{3/4})$ regret in the local model, if all users are unique within a learning horizon $T$. In this work, we aim to achieve a stronger model of trust than the central model, while suffering a smaller regret than the local model by considering recently popular shuffle model of privacy. We propose a general algorithmic framework for linear contextual bandits under the shuffle trust model, where there exists a trusted shuffler in between users and the central server, that randomly permutes a batch of users data before sending those to the server. We then instantiate this framework with two specific shuffle protocols: one relying on privacy amplification of local mechanisms, and another incorporating a protocol for summing vectors and matrices of bounded norms. We prove that both these instantiations lead to regret guarantees that significantly improve on that of the local model, and can potentially be of the order $\tilde{O}(T^{3/5})$ if all users are unique. We also verify this regret behavior with simulations on synthetic data. Finally, under the practical scenario of non-unique users, we show that the regret of our shuffle private algorithm scale as $\tilde{O}(T^{2/3})$, which matches that the central model could achieve in this case.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源