论文标题

公共估计子空间的快速尺寸独立私人Adagrad

Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces

论文作者

Kairouz, Peter, Ribero, Mónica, Rush, Keith, Thakurta, Abhradeep

论文摘要

我们重新审视具有差异隐私的经验风险最小化(ERM)的问题。我们表明,嘈杂的Adagrad鉴于可以从中绘制梯度的适当的知识和条件,因此遗憾的是与传统的Adagrad以及由于噪音而受到控制的术语相当。我们显示$ o(\ text {tr}(g_t)/t)$的收敛速率,其中$ g_t $捕获了梯度子空间的几何形状。与$ O(1/\ sqrt {t})$相比,由于$(\ sqrt {t})$,与$ o(1/\ sqrt {t})$相比,我们可以获得更快的速率和Lipschitz函数的速率,我们可以获得更快的速度。 In particular, we show that if the gradients lie in a known constant rank subspace, and assuming algorithmic access to an envelope which bounds decaying sensitivity, one can achieve faster convergence to an excess empirical risk of $\tilde O(1/εn)$, where $ε$ is the privacy budget and $n$ the number of samples.让$ p $是问题维度,这意味着,通过运行嘈杂的adagrad,我们可以绕过dp-sgd绑定的$ \ tilde o(\ sqrt {p}/εn)$ in $ t =(εn)^{εn)^{2/(1+2α)} $ aterations paramiention n s age age a geqq q quoncone, $ t =ε^2n^2 $的SGD。我们的结果在受约束和不受约束的最小化中都具有一般凸功能。 在此过程中,我们对独立利益的嘈杂Adagrad进行了扰动分析。我们对私人ERM问题的公用事业保证是作为嘈杂Adagrad的遗憾保证的推论。

We revisit the problem of empirical risk minimziation (ERM) with differential privacy. We show that noisy AdaGrad, given appropriate knowledge and conditions on the subspace from which gradients can be drawn, achieves a regret comparable to traditional AdaGrad plus a well-controlled term due to noise. We show a convergence rate of $O(\text{Tr}(G_T)/T)$, where $G_T$ captures the geometry of the gradient subspace. Since $\text{Tr}(G_T)=O(\sqrt{T})$ we can obtain faster rates for convex and Lipschitz functions, compared to the $O(1/\sqrt{T})$ rate achieved by known versions of noisy (stochastic) gradient descent with comparable noise variance. In particular, we show that if the gradients lie in a known constant rank subspace, and assuming algorithmic access to an envelope which bounds decaying sensitivity, one can achieve faster convergence to an excess empirical risk of $\tilde O(1/εn)$, where $ε$ is the privacy budget and $n$ the number of samples. Letting $p$ be the problem dimension, this result implies that, by running noisy Adagrad, we can bypass the DP-SGD bound $\tilde O(\sqrt{p}/εn)$ in $T=(εn)^{2/(1+2α)}$ iterations, where $α\geq 0$ is a parameter controlling gradient norm decay, instead of the rate achieved by SGD of $T=ε^2n^2$. Our results operate with general convex functions in both constrained and unconstrained minimization. Along the way, we do a perturbation analysis of noisy AdaGrad of independent interest. Our utility guarantee for the private ERM problem follows as a corollary to the regret guarantee of noisy AdaGrad.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源