论文标题
持续学习:特征提取形式化,有效的算法和基本障碍
Continual learning: a feature extraction formalization, an efficient algorithm, and fundamental obstructions
论文作者
论文摘要
持续学习是机器学习中的新兴范式,其中模型以在线方式暴露于来自多个不同分布(即环境)的数据,并有望适应分布变化。确切地说,目标是在新环境中表现良好,同时在以前的环境(即避免“灾难性遗忘”)上保持性能,而不会增加模型的大小。 尽管这种设置在应用社区中引起了很多关注,但没有理论的工作使所需的保证正式。在本文中,我们提出了一个通过特征提取的框架进行持续学习的框架,即,在每个环境中都对特征和分类器进行了培训。当功能是线性的时,我们会设计一种有效的基于梯度的算法$ \ mathsf {dpgd} $,它可以保证在当前环境上表现良好,并避免灾难性的遗忘。在一般情况下,当特征是非线性的时,我们表明这种算法不存在,无论是否有效。
Continual learning is an emerging paradigm in machine learning, wherein a model is exposed in an online fashion to data from multiple different distributions (i.e. environments), and is expected to adapt to the distribution change. Precisely, the goal is to perform well in the new environment, while simultaneously retaining the performance on the previous environments (i.e. avoid "catastrophic forgetting") -- without increasing the size of the model. While this setup has enjoyed a lot of attention in the applied community, there hasn't be theoretical work that even formalizes the desired guarantees. In this paper, we propose a framework for continual learning through the framework of feature extraction -- namely, one in which features, as well as a classifier, are being trained with each environment. When the features are linear, we design an efficient gradient-based algorithm $\mathsf{DPGD}$, that is guaranteed to perform well on the current environment, as well as avoid catastrophic forgetting. In the general case, when the features are non-linear, we show such an algorithm cannot exist, whether efficient or not.