内核制度中知识转移的分析

论文标题

内核制度中知识转移的分析

Analysis of Knowledge Transfer in Kernel Regime

论文作者

Rahbar, Arman, Panahi, Ashkan, Bhattacharyya, Chiranjib, Dubhashi, Devdatt, Chehreghani, Morteza Haghir

论文摘要

知识转移被证明是培训神经分类器的非常成功的技术：与地面真相数据一起，它使用了由“教师”网络获得的“特权信息”（PI）来培训“学生”网络。已经观察到，通过知识转移，分类器学习得更快，更可靠。但是，这种现象几乎没有理论分析。为了弥合这一差距，我们建议通过将老师与学生提供的PI的适合度正规化，以解决知识转移的问题。使用动态系统理论的工具，我们表明，当学生是一个非常宽的两层网络时，我们可以在内核制度中进行分析，并证明它能够在PI和给定数据之间插值。这种表征阐明了学生相对于老师的培训错误与能力之间的关系。本文的另一个贡献是关于学生网络融合的定量陈述。我们证明，老师减少了学生学习所需的迭代次数，因此可以提高学生的概括能力。我们提供相应的实验分析，以验证理论结果并产生其他见解。

Knowledge transfer is shown to be a very successful technique for training neural classifiers: together with the ground truth data, it uses the "privileged information" (PI) obtained by a "teacher" network to train a "student" network. It has been observed that classifiers learn much faster and more reliably via knowledge transfer. However, there has been little or no theoretical analysis of this phenomenon. To bridge this gap, we propose to approach the problem of knowledge transfer by regularizing the fit between the teacher and the student with PI provided by the teacher. Using tools from dynamical systems theory, we show that when the student is an extremely wide two layer network, we can analyze it in the kernel regime and show that it is able to interpolate between PI and the given data. This characterization sheds new light on the relation between the training error and capacity of the student relative to the teacher. Another contribution of the paper is a quantitative statement on the convergence of student network. We prove that the teacher reduces the number of required iterations for a student to learn, and consequently improves the generalization power of the student. We give corresponding experimental analysis that validates the theoretical results and yield additional insights.

下载PDF全文

下载文献需遵守相关版权规定

论文标题