在Q学习中逐拟教学的样本复杂性

论文标题

在Q学习中逐拟教学的样本复杂性

The Sample Complexity of Teaching-by-Reinforcement on Q-Learning

论文作者

Zhang, Xuezhou, Bharti, Shubham Kumar, Ma, Yuzhe, Singla, Adish, Zhu, Xiaojin

论文摘要

我们研究文献中的教学样本复杂性，被称为“教学维度”（TDIM），以逐步教学范式，在那里教师指导学生通过奖励。这与由机器人应用所激发的逐步教学范式不同，教师通过提供国家/行动轨迹的演示来教授。逐拟教学范式适用于更广泛的现实环境，在这些环境中，演示是不便的，但尚未系统地研究。在本文中，我们专注于特定的强化学习算法，Q学习的家族，并在不同的教师中表征具有不同控制能力的TDIM在环境中，并提供了匹配的最佳教学算法。我们的TDIM结果提供了增强学习所需的最小样本数量，我们讨论了它们与标准PAC风格的RL样本复杂性和逐个示范样品复杂性结果的联系。我们的教学算法有可能在有帮助的老师的应用程序中加快RL代理学习的速度。

We study the sample complexity of teaching, termed as "teaching dimension" (TDim) in the literature, for the teaching-by-reinforcement paradigm, where the teacher guides the student through rewards. This is distinct from the teaching-by-demonstration paradigm motivated by robotics applications, where the teacher teaches by providing demonstrations of state/action trajectories. The teaching-by-reinforcement paradigm applies to a wider range of real-world settings where a demonstration is inconvenient, but has not been studied systematically. In this paper, we focus on a specific family of reinforcement learning algorithms, Q-learning, and characterize the TDim under different teachers with varying control power over the environment, and present matching optimal teaching algorithms. Our TDim results provide the minimum number of samples needed for reinforcement learning, and we discuss their connections to standard PAC-style RL sample complexity and teaching-by-demonstration sample complexity results. Our teaching algorithms have the potential to speed up RL agent learning in applications where a helpful teacher is available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题