论文标题
通过微分成对隐私安全度量学习
Secure Metric Learning via Differential Pairwise Privacy
论文作者
论文摘要
在过去的二十年中,远程度量学习(DML)引起了很多关注。以前的许多作品表明,它在衡量个人的相似性方面表现良好,给出了一组由域专家正确标记的成对数据。这些重要且精确的成对数据在现实世界中通常非常敏感(例如患者相似性)。本文首次研究如何在远程度量学习过程中将成对信息泄漏给攻击者,并开发差异成对隐私(DPP),从而概括了标准差异隐私的定义,以进行安全度量学习。与仅适用于独立样本的传统差异隐私不同,因此不能用于成对数据,DPP通过重新调整最坏情况来成功解决此问题。具体而言,鉴于成对数据,我们揭示了构造的无向图中对成对之间的所有相关性。然后对DPP进行正式化,以定义哪种DML算法是私人来保留成对数据的。之后,展示了采用对比度损失的案例研究,以阐明实施DPP-DML算法的细节。特别是,提出了降低灵敏度,以增强输出距离度量的效用。玩具数据集和基准测试的实验表明,所提出的方案在不损害输出性能的情况下实现了成对数据隐私(当将隐私预算设置为4时,在所有基准数据集中的准确性都小于0.01)。
Distance Metric Learning (DML) has drawn much attention over the last two decades. A number of previous works have shown that it performs well in measuring the similarities of individuals given a set of correctly labeled pairwise data by domain experts. These important and precisely-labeled pairwise data are often highly sensitive in real world (e.g., patients similarity). This paper studies, for the first time, how pairwise information can be leaked to attackers during distance metric learning, and develops differential pairwise privacy (DPP), generalizing the definition of standard differential privacy, for secure metric learning. Unlike traditional differential privacy which only applies to independent samples, thus cannot be used for pairwise data, DPP successfully deals with this problem by reformulating the worst case. Specifically, given the pairwise data, we reveal all the involved correlations among pairs in the constructed undirected graph. DPP is then formalized that defines what kind of DML algorithm is private to preserve pairwise data. After that, a case study employing the contrastive loss is exhibited to clarify the details of implementing a DPP-DML algorithm. Particularly, the sensitivity reduction technique is proposed to enhance the utility of the output distance metric. Experiments both on a toy dataset and benchmarks demonstrate that the proposed scheme achieves pairwise data privacy without compromising the output performance much (Accuracy declines less than 0.01 throughout all benchmark datasets when the privacy budget is set at 4).