关于包装与数据中毒的集体鲁棒性

论文标题

关于包装与数据中毒的集体鲁棒性

On Collective Robustness of Bagging Against Data Poisoning

论文作者

Chen, Ruoxin, Li, Zenan, Li, Jie, Wu, Chentao, Yan, Junchi

论文摘要

Bootstrap汇总（装袋）是一种有效的合奏协议，据信可以通过其多数投票机制来增强鲁棒性。最近的工作进一步证明了某些形式的包装的样本鲁棒性证书（例如分区聚合）。除了这些特定形式之外，在本文中，\ emph {我们提出了第一个集体认证，用于一般行李，以计算针对全球中毒攻击的紧密鲁棒性}。具体而言，我们通过解决二进制整数线性编程（BILP）问题来计算同时更改预测的最大数量。然后，我们分析香草包装的鲁棒性，并给出可容忍的毒药预算的上限。基于此分析，\ emph {我们提出了哈希袋}，以几乎免费提高香草包装的稳健性。这是通过将香草包装中的随机亚采样修改为基于哈希的确定性子采样的方法，作为控制每个中毒样本的影响范围的一种方式。我们的广泛实验表明，在适用性和鲁棒性方面具有显着优势。

Bootstrap aggregating (bagging) is an effective ensemble protocol, which is believed can enhance robustness by its majority voting mechanism. Recent works further prove the sample-wise robustness certificates for certain forms of bagging (e.g. partition aggregation). Beyond these particular forms, in this paper, \emph{we propose the first collective certification for general bagging to compute the tight robustness against the global poisoning attack}. Specifically, we compute the maximum number of simultaneously changed predictions via solving a binary integer linear programming (BILP) problem. Then we analyze the robustness of vanilla bagging and give the upper bound of the tolerable poison budget. Based on this analysis, \emph{we propose hash bagging} to improve the robustness of vanilla bagging almost for free. This is achieved by modifying the random subsampling in vanilla bagging to a hash-based deterministic subsampling, as a way of controlling the influence scope for each poisoning sample universally. Our extensive experiments show the notable advantage in terms of applicability and robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题