非凸和凸元学习之间的样品复杂性分离

论文标题

非凸和凸元学习之间的样品复杂性分离

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

论文作者

Saunshi, Nikunj, Zhang, Yi, Khodak, Mikhail, Arora, Sanjeev

论文摘要

元学习的一种流行趋势是从许多培训任务中学习一种基于梯度的方法的共同初始化，该方法可用于用很少的样本来解决新任务。元学习理论仍处于早期阶段，最近对爬行动物[Nichol等人，2018]等方法进行了一些学习理论分析。这项工作表明，凸库分析可能不足以了解元学习的成功，即使对于非凸模型，也很重要的是在优化黑框内，特别是在优化轨迹的属性中。我们构建了一个简单的元学习实例，该实例捕获了一维子空间学习的问题。对于在这种情况下线性回归的凸公式，我们表明任何基于初始化的元学习算法的新任务样本复杂性为$ω（d）$，其中$ d $是输入维度。相比之下，对于同一实例上两层线性网络的非凸公式，我们表明爬行动物和多任务表示学习都可以具有$ \ mathcal {o}（1）$的新任务样本复杂性，这表明与convex meta-learning的分离。至关重要的是，对这些方法的训练动力学的分析表明，它们可以在应向数据投影的正确子空间中进行元空间。

One popular trend in meta-learning is to learn from many training tasks a common initialization for a gradient-based method that can be used to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for convex models. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any initialization-based meta-learning algorithm is $Ω(d)$, where $d$ is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $\mathcal{O}(1)$, demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.

下载PDF全文

下载文献需遵守相关版权规定

论文标题