通过测试时间神经网络的转移学习

论文标题

通过测试时间神经网络的转移学习

Transfer Learning via Test-Time Neural Networks Aggregation

论文作者

Casella, Bruno, Chisari, Alessio Barbaro, Battiato, Sebastiano, Giuffrida, Mario Valerio

论文摘要

已经证明，深神经网络的表现要优于传统机器学习。但是，深层网络缺乏普遍性，也就是说，由于域移动，它们的性能不如从不同分布中绘制的新（测试）集中的表现。为了解决这一已知问题，已经提出了几种转移学习方法，其中训练有素的模型的知识被转移到另一个转移中，以通过不同的数据来提高性能。但是，这些方法中的大多数都需要额外的培训步骤，或者它们遭受灾难性的遗忘，这是训练有素的模型已经覆盖以前学习的知识时发生的。我们采用了使用网络聚合的新型转移学习方法来解决这两个问题。我们在统一框架中与集合网络一起训练数据集特定网络。损失函数包括两个主要组成部分：特定于任务的损失（例如跨凝性）和聚合损失。提出的聚合损失使我们的模型可以学习如何通过聚合操作员聚集训练的深网参数。我们证明，所提出的方法在测试时间学习模型聚集，而无需进一步的训练步骤，从而减少了转移学习的负担为简单的算术操作。提出的方法达到了可比的性能W.R.T.基线。此外，如果聚合操作员有逆，我们将证明我们的模型还可以固有地允许选择性忘记，即，聚合模型可以忘记训练它的数据集之一，并保留有关其他数据的信息。

It has been demonstrated that deep neural networks outperform traditional machine learning. However, deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution due to the domain shift. In order to tackle this known issue, several transfer learning approaches have been proposed, where the knowledge of a trained model is transferred into another to improve performance with different data. However, most of these approaches require additional training steps, or they suffer from catastrophic forgetting that occurs when a trained model has overwritten previously learnt knowledge. We address both problems with a novel transfer learning approach that uses network aggregation. We train dataset-specific networks together with an aggregation network in a unified framework. The loss function includes two main components: a task-specific loss (such as cross-entropy) and an aggregation loss. The proposed aggregation loss allows our model to learn how trained deep network parameters can be aggregated with an aggregation operator. We demonstrate that the proposed approach learns model aggregation at test time without any further training step, reducing the burden of transfer learning to a simple arithmetical operation. The proposed approach achieves comparable performance w.r.t. the baseline. Besides, if the aggregation operator has an inverse, we will show that our model also inherently allows for selective forgetting, i.e., the aggregated model can forget one of the datasets it was trained on, retaining information on the others.

下载PDF全文

下载文献需遵守相关版权规定

论文标题