论文标题

Fedgen:可概括的联邦学习顺序数据

FedGen: Generalizable Federated Learning for Sequential Data

论文作者

Venkateswaran, Praveen, Isahagian, Vatche, Muthusamy, Vinod, Venkatasubramanian, Nalini

论文摘要

遵循机器学习的标准风险最小化范式的现有联合学习模型通常在培训数据中存在虚假相关性的情况下通常无法概括。在许多实际分布式设置中,由于分布式设备或客户端可能会错误影响模型的偏差和数据采样问题而存在虚假相关性。当前的概括方法设计用于集中式培训,并试图识别与目标有不变关系的特征,从而降低了虚假特征的效果。但是,这种不变的风险最小化方法取决于培训数据分布的APRIORI知识,这些知识在许多应用中很难获得。在这项工作中,我们提出了一个名为FedGen的可推广的联合学习框架,该框架允许客户以协作方式识别和区分虚假和不变的功能,而无需先前了解培训分布。我们从不同域中评估了对现实世界数据集的方法,并表明FedGen会导致实现明显更好的概括,并胜过当前联邦学习方法的准确性超过24%以上。

Existing federated learning models that follow the standard risk minimization paradigm of machine learning often fail to generalize in the presence of spurious correlations in the training data. In many real-world distributed settings, spurious correlations exist due to biases and data sampling issues on distributed devices or clients that can erroneously influence models. Current generalization approaches are designed for centralized training and attempt to identify features that have an invariant causal relationship with the target, thereby reducing the effect of spurious features. However, such invariant risk minimization approaches rely on apriori knowledge of training data distributions which is hard to obtain in many applications. In this work, we present a generalizable federated learning framework called FedGen, which allows clients to identify and distinguish between spurious and invariant features in a collaborative manner without prior knowledge of training distributions. We evaluate our approach on real-world datasets from different domains and show that FedGen results in models that achieve significantly better generalization and can outperform the accuracy of current federated learning approaches by over 24%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源