论文标题
通过惩教学习的合作系统识别
Cooperative System Identification via Correctional Learning
论文作者
论文摘要
我们考虑了一种合作系统识别方案,其中专家代理人(教师)知道正确的或至少是一个良好的系统模型,并旨在协助学习者(学生),但不能直接将其知识转移给学生。例如,教师对系统的了解可能是抽象的,或者教师和学生可能正在采用不同的模型类,这使教师的参数对学生的不知情。在本文中,我们提出惩教学习作为上述问题的一种方法:假设为了帮助学生,教师可以拦截从系统中收集的观察结果并修改它们以最大程度地提高学生对系统的信息的最大信息。我们将一般解决方案作为优化问题,该解决方案对于多项式系统将其自身作为整数程序实例化。此外,我们获得了有限样本的结果,即二项式系统的教师的援助结果的改进结果(通过估计器差异的降低来衡量)。
We consider a cooperative system identification scenario in which an expert agent (teacher) knows a correct, or at least a good, model of the system and aims to assist a learner-agent (student), but cannot directly transfer its knowledge to the student. For example, the teacher's knowledge of the system might be abstract or the teacher and student might be employing different model classes, which renders the teacher's parameters uninformative to the student. In this paper, we propose correctional learning as an approach to the above problem: Suppose that in order to assist the student, the teacher can intercept the observations collected from the system and modify them to maximize the amount of information the student receives about the system. We formulate a general solution as an optimization problem, which for a multinomial system instantiates itself as an integer program. Furthermore, we obtain finite-sample results on the improvement that the assistance from the teacher results in (as measured by the reduction in the variance of the estimator) for a binomial system.